Hive code analysis report: semantic analysis ③

Posted by Ludo Lambrechts on Tue, 16 Nov 2021 10:57:20 +0100

2021SC@SDUSC

catalogue

summary

Semantic analyzer class analysis:

① From analyzeInternal(ASTNode ast) to genResolvedParseTree(ASTNode, planercontext)

summary

Earlier, I made a simple analysis of the basesemanticananalyzer class and learned that this class is the base class of each semantic analyzer, and its derived subclasses include semantic analyzer and many other query analyzers. Although the name of query analyzer is very general, and it may include various subsequent query processes, the trend of instructions in the compiler implies that the key steps of semantic analysis are here. The previous interpretation of basesemanticalanalyzer class and semanticalanalyzerfactory class is very helpful for understanding. According to the information provided by ast tree, the latter can call semanticalanalyzerfactory. Get() function to get corresponding query analysis subclasses (these classes inherit from basesemanticalanalyzer). No accident, These subclasses have their own unique processing functions for different types of data query requests, because they all override the analyzeInternal(ASTNode,Context) method of the parent class. This method is the next direction of the information contained in AST.

I focus on the semanticalanalyzer class, which has 10000 + lines of code. According to the query related data, this class is the main processing logic of most query types. Therefore, I will focus on the analyzeInternal() and other functions of this class.

Semantic analyzer class analysis:

① From analyzeInternal(ASTNode ast) to genResolvedParseTree(ASTNode, planercontext)

Because this class is a subclass of basesemanticeanalyzer, the routines of some of its simple member functions are quite different from those of the parent class. Therefore, you can go straight to the topic and focus on the analyzeInternal() function pointed to in the previous stage of the compilation process. The code is as follows.

public void analyzeInternal(ASTNode ast) throws SemanticException {
    analyzeInternal(ast, new PlannerContext());
  }

Thus, enter analyzeinternal (AST, new planner context());

void analyzeInternal(ASTNode ast, PlannerContext plannerCtx) throws SemanticException {

   

    LOG.info("Starting Semantic Analysis");

  

    processPositionAlias(ast);

    if (!genResolvedParseTree(ast, plannerCtx)) {

      return;

    }

We analyze this function step by step. In the first line of the function, the log information prompts us to enter query analysis from here. Then call the function processPositionAlias(ast); The function parameter is the ast node introduced by the outer parameter. The function is to handle all aliases first.

After that, use the if statement to determine the return value of the genresolved parsetree (AST, planerctx) function, so let's take a look at what the genresolved parsetree (AST, planerctx) should do.

boolean genResolvedParseTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticException {
ASTNode child = ast;
    this.ast = ast;
    viewsExpanded = new ArrayList<String>();
    ctesExpanded = new ArrayList<String>();

 

Firstly, some temporary variables required by some functions are given.

The structure of the following parts is very clear, mainly several conditional judgment statements to lead to relevant operations. This is also the inevitable judgment caused by different kinds of operations represented by the token of the AST node. As follows:

According to hiveparser.tok_ Create table, you can know that this part will judge whether it is the analysis create table command. If it is not CTAs (create table... As select...) syntax, just return directly. If it is, execute setCommandType.

 if (ast.getToken().getType() == HiveParser.TOK_CREATETABLE) {

      if ((child = analyzeCreateTable(ast, qb, plannerCtx)) == null) {

        return false;

      }

    } else {

      queryState.setCommandType(HiveOperation.QUERY);

    }

   According to hiveparser.tok_ Create view, you can know that this part will judge whether it is the analysis view creation command

Mainly through the switch case statement, ast.getChild(0).getType(), that is, the of the child node and HiveParser.TOK_TRUE and other preset flags are matched to determine subsequent operations.    

 if (ast.getToken().getType() == HiveParser.TOK_CREATEVIEW ||

        ast.getToken().getType() == HiveParser.TOK_CREATE_MATERIALIZED_VIEW ||

        (ast.getToken().getType() == HiveParser.TOK_ALTERVIEW &&

            ast.getChild(1).getType() == HiveParser.TOK_QUERY)) {

      child = analyzeCreateView(ast, qb, plannerCtx);

      if (child == null) {

        return false;

      }

      viewSelect = child;

      // This step ensures that the view cannot reference itself

      viewsExpanded.add(createVwDesc.getViewName());

    }


    switch(ast.getToken().getType()) {

      case HiveParser.TOK_SET_AUTOCOMMIT:

        assert ast.getChildCount() == 1;

        if(ast.getChild(0).getType() == HiveParser.TOK_TRUE) {

          setAutoCommitValue(true);

        }

        else if(ast.getChild(0).getType() == HiveParser.TOK_FALSE) {

          setAutoCommitValue(false);

        }

        else {

          assert false : "Unexpected child of TOK_SET_AUTOCOMMIT: " + ast.getChild(0).getType();

        }

        // When the plan fails

      case HiveParser.TOK_START_TRANSACTION:

      case HiveParser.TOK_COMMIT:

      case HiveParser.TOK_ROLLBACK:

        if(!(conf.getBoolVar(ConfVars.HIVE_IN_TEST) || conf.getBoolVar(ConfVars.HIVE_IN_TEZ_TEST))) {

          throw new IllegalStateException(SemanticAnalyzerFactory.getOperation(ast.getToken().getType()) +

            " is not supported yet.");

        }

        queryState.setCommandType(SemanticAnalyzerFactory.getOperation(ast.getToken().getType()));

        return false;

    }


    // Masking and filtering analysis

    tableMask = new TableMask(this, conf, ctx.isSkipTableMasking());

   The following steps are used to convert the AST to QB. doPhase1 is responsible for decomposing the ASTTree into the corresponding QB. If the phase1Result error returns false.

Phase1Ctx ctx_1 = initPhase1Ctx();

    preProcessForInsert(child, qb);

  

    if (!doPhase1(child, qb, ctx_1, plannerCtx)) {

    

      return false;

    }

    LOG.info("Completed phase 1 of Semantic Analysis");

  The following steps deal with metadata. getMetaData is responsible for storing metadata information such as tables and fields into QB    

getMetaData(qb, createVwDesc == null);

    LOG.info("Completed getting MetaData in Semantic Analysis");


    plannerCtx.setParseTreeAttr(child, ctx_1);


    return true;

  }

Summary:

Here, simply enter the analyzeInternal () function according to the process, and then mainly analyze the basic structure and main functions of the genresolved parsetree (astnode, planercontext) function. More in-depth analysis will be carried out later.

Topics: Hadoop hive Data Warehouse