Technology sharing | on mysql syntax parsing and debugging methods

Posted by Cogen2 on Mon, 13 Dec 2021 07:41:18 +0100

Welcome to the MySQL technical articles shared by the great SQL community. If you have any questions or want to learn, you can leave a message in the comment area below. After seeing it, you will answer it

This article introduces you to a method of sql syntax parsing and tracing using mysql parser and bison's debugging options.

In the process of database development, we often encounter the need to modify SQL syntax. As we know, mysql's SQL parser is based on yacc grammar, uses EBNF format for rule description (sql/sql_yacc.yy) and generates (sql_yacc.h, sql_yacc.cc) with the help of bison tool. Therefore, when modifying SQL syntax, it is inevitable to deal with these yacc grammars and modify sql_yacc.yy for transformation and upgrading.

Yacc grammar is a high generalization of syntax parsing. It provides us with an elegant way to modify the parser, but at the same time, when we encounter statement parsing problems, it is usually difficult to find the reason directly from the abstract syntax rules. Fortunately, combined with the debugging tools provided by mysql and bison, we have the opportunity to visualize the whole syntax parsing process. By parsing the log, yacc rules and the correspondence of automatic state machine, we can locate the problem quickly.

mysql parser debugging switch

sql/sql_yacc.yy file, you can see the following code:

#ifndef NDEBUG
void turn_parser_debug_on()
{
  /*
     MYSQLdebug is in sql/sql_yacc.cc, in bison generated code.
     Turning this option on is **VERY** verbose, and should be
     used when investigating a syntax error problem only.

     The syntax to run with bison traces is as follows :
     - Starting a server manually :
       mysqld --debug="d,parser_debug" ...
     - Running a test :
       mysql-test-run.pl --mysqld="--debug=d,parser_debug" ...

     The result will be in the process stderr (var/log/master.err)
   */

  extern int yydebug;
  yydebug= 1;
}
#endif

It tells us that in the debug version, add the - debug = "d", parser_debug option when mysqld starts, and the database server will output the specific information of sql parsing (bison traces) for us.

Here, we use a simple sql statement SELECT 1+2*3 FROM DUAL as an example to see its log output information (Note: the description added after the '#' sign is not the original information). The beginning part is as follows:

#Note: the SQL statement will first be processed by the lexical parser (LEXER) and output 'select'_ SYM NUM + NUM * NUM FROM DUAL_ SYM 'is used as the output of the syntax parser
Starting parse       #Statement parsing start
Entering state 0
Reading a token: Next token is token SELECT_SYM (: )  # Read in SELECT
Shifting token SELECT_SYM (: )                        # Move in SELECT
Entering state 42                                     # The stack is used to record the current derivation
Reading a token: Next token is token NUM (: )         # Read num (lexical parsing token of the first number '1')
Reducing stack by rule 1377 (line 10001):             # Before reading in, make a stack specification (the rules used are in line 10001 of sql_yacc.yy)
-> $$ = nterm select_options (: )  
Stack now 0 42
Entering state 1013                                   # After stack specification, enter a new state
...

state 42, 1013 and other information in the output information, yacc syntax and the state number in the automatic state machine. In order to view it, we need to use bison tool to manually generate the automatic state machine file.

  • Automatic state machine file

Use bison's - v option to get the syntax automatic state machine file. An example of the generation method is as follows:

cd ${SOURCE_DIR}/sql   #SOURCE_DIR is the mysql source directory location
/usr/bin/bison --name-prefix=MYSQL --yacc --warnings=all,no-yacc,no-empty-rule,no-precedence,no-deprecated --defines=${BUILD_DIR}/sql/sql_yacc.h -v sql_yacc.yy  #BUILD_DIR is the user-defined compilation directory location

After successful execution, a file named y.output will be generated under ${SOURCE_DIR}/sql, which describes the state machine description file calculated by bison according to the syntax rules. In the file, we will see:

1. Description of numbered syntax rules. As mentioned above, rule 1377 is as follows:

1377 select_options: %empty

It means that an empty production can be specified as select_option

2. Status of all automata. The state 42 mentioned above is shown in the file as:

State 42

  1366 query_specification: SELECT_SYM . select_options select_item_list into_clause opt_from_clause opt_where_clause opt_group_clause opt_having_clause opt_window_clause
...
    ALL                  shift, and go to state 1004
...
    select_options      go to state 1013
    select_option_list  go to state 1014
    select_option       go to state 1015
    query_spec_option   go to state 1016

3. Status statistics with shift/reduce and reduce/reduce conflicts:

State 27 conflicts: 2 shift/reduce
State 42 conflicts: 2 shift/reduce
State 220 conflicts: 2 shift/reduce

mysql-8.0.0 is used in this test 25. There are a total of 66 existing shift/reduce conflicts. mysql does not encourage any new conflicts in the state machine due to syntax modification. Therefore, more attention should be paid in the development process:

/*
 1. We do not accept any reduce/reduce conflicts
 2. We should not introduce new shift/reduce conflicts any more.
%expect 66
*/

With the stack information provided by mysql and the state machine file generated by bison -v, we can print the derivation path of a specific node in the syntax parsing process. For example, we can print the parser after processing select_ After sym NUM +, the derivation process before preparing to read NUM (stack status: 0 42 1013) is sorted as follows (Note: ") On the left side of the position, it can be seen that the current status has been moved in or the content of the protocol):

In this way, we can clearly know the specific state of the parser at each stage of sql parsing. Therefore, when syntax modification errors occur, we can easily locate which part of our rules has exceptions, and then solve the problem more quickly.

Enjoy GreatSQL :)

This article is composed of blog one article multi posting platform OpenWrite release!

Topics: Database MySQL SQL