[statement: All rights reserved, welcome to reprint, please do not use it for commercial purposes. Contact email: feixiaoxing @163.com]
Finish lexical analysis, followed by grammatical analysis. For a programming language, syntax parsing is the biggest difference between language and language. c language has its own grammar, cpp also has cpp grammar, so we say that learning a new language, in fact, the main work is to learn its grammar.
There are two kinds of grammar analysis: top-down and bottom-up. For the manual implementation of the compiler, top-down is actually easier to handle, because most of the work can be handled by recursive method, as long as there is no left recursive syntax.
In addition, at present, it is no longer necessary to write all the code from beginning to end to implement a new language. We can use tools such as bison and yacc to help us write the code, which is also possible. After all, the so-called syntax analysis is ultimately to build a syntax tree.
1. Three main grammatical forms of c language
a. declaration syntax
int a; char b; float c;
b. expression syntax
a = 1; b = 2; c = a + b;
c. statement syntax
{ // other code if(expression) { // ... } else { // ... } }
2. Syntax parsing file
decl.c
https://github.com/nobled/ucc/blob/master/ucl/decl.c
expr.c
https://github.com/nobled/ucc/blob/master/ucl/expr.c
stmt.c
https://github.com/nobled/ucc/blob/master/ucl/stmt.c
3. Syntax tree printing
dumpast.c
https://github.com/nobled/ucc/blob/master/ucl/dumpast.c
The entry function is DumpTranslationUnit
void DumpTranslationUnit(AstTranslationUnit transUnit) { AstNode p; ASTFile = CreateOutput(Input.filename, ".ast"); p = transUnit->extDecls; while (p) { if (p->kind == NK_Function) { DumpFunction((AstFunction)p); } p = p->next; } fclose(ASTFile); }
4. Without losing generality, we start with the analysis of stmt
4.1 stmt analysis main entrance
static AstStatement ParseStatement(void) { switch (CurrentToken) { case TK_ID: return ParseLabelStatement(); case TK_CASE: return ParseCaseStatement(); case TK_DEFAULT: return ParseDefaultStatement(); case TK_IF: return ParseIfStatement(); case TK_SWITCH: return ParseSwitchStatement(); case TK_WHILE: return ParseWhileStatement(); case TK_DO: return ParseDoStatement(); case TK_FOR: return ParseForStatement(); case TK_GOTO: return ParseGotoStatement(); case TK_CONTINUE: return ParseContinueStatement(); case TK_BREAK: return ParseBreakStatement(); case TK_RETURN: return ParseReturnStatement(); case TK_LBRACE: return ParseCompoundStatement(); default: return ParseExpressionStatement(); } }
4.2 if statement parsing
/** * if-statement: * if ( expression ) statement * if ( epxression ) statement else statement */ static AstStatement ParseIfStatement(void) { AstIfStatement ifStmt; CREATE_AST_NODE(ifStmt, IfStatement); NEXT_TOKEN; Expect(TK_LPAREN); ifStmt->expr = ParseExpression(); Expect(TK_RPAREN); ifStmt->thenStmt = ParseStatement(); if (CurrentToken == TK_ELSE) { NEXT_TOKEN; ifStmt->elseStmt = ParseStatement(); } return (AstStatement)ifStmt; }
4.3 for statement analysis
/** * for-statement: * for ( [expression] ; [expression] ; [expression] ) statement */ static AstStatement ParseForStatement() { AstForStatement forStmt; CREATE_AST_NODE(forStmt, ForStatement); NEXT_TOKEN; Expect(TK_LPAREN); if (CurrentToken != TK_SEMICOLON) { forStmt->initExpr = ParseExpression(); } Expect(TK_SEMICOLON); if (CurrentToken != TK_SEMICOLON) { forStmt->expr = ParseExpression(); } Expect(TK_SEMICOLON); if (CurrentToken != TK_RPAREN) { forStmt->incrExpr = ParseExpression(); } Expect(TK_RPAREN); forStmt->stmt = ParseStatement(); return (AstStatement)forStmt; }
4.4 switch statement analysis
/** * switch-statement: * switch ( expression ) statement */ static AstStatement ParseSwitchStatement(void) { AstSwitchStatement swtchStmt; CREATE_AST_NODE(swtchStmt, SwitchStatement); NEXT_TOKEN; Expect(TK_LPAREN); swtchStmt->expr = ParseExpression(); Expect(TK_RPAREN); swtchStmt->stmt = ParseStatement(); return (AstStatement)swtchStmt; }
4.5 break statement analysis
/** * break-statement: * break ; */ static AstStatement ParseBreakStatement(void) { AstBreakStatement brkStmt; CREATE_AST_NODE(brkStmt, BreakStatement); NEXT_TOKEN; Expect(TK_SEMICOLON); return (AstStatement)brkStmt; }
This part may be different from what you understand. case, break, continue, return and goto are actually handled as an independent statement.
4.6 a special statement, that is, the statement of {}
/** * compound-statement: * { [declaration-list] [statement-list] } * declaration-list: * declaration * declaration-list declaration * statement-list: * statement * statement-list statement */ AstStatement ParseCompoundStatement(void) { AstCompoundStatement compStmt; AstNode *tail; Level++; CREATE_AST_NODE(compStmt, CompoundStatement); NEXT_TOKEN; tail = &compStmt->decls; while (CurrentTokenIn(FIRST_Declaration)) { if (CurrentToken == TK_ID && ! IsTypeName(CurrentToken)) break; *tail = (AstNode)ParseDeclaration(); tail = &(*tail)->next; } tail = &compStmt->stmts; while (CurrentToken != TK_RBRACE && CurrentToken != TK_END) { *tail = (AstNode)ParseStatement(); tail = &(*tail)->next; if (CurrentToken == TK_RBRACE) break; SkipTo(FIRST_Statement, "the beginning of a statement"); } Expect(TK_RBRACE); PostCheckTypedef(); Level--; return (AstStatement)compStmt; }
4.7 connection between and expression
/** * expression-statement: * [expression] ; */ static AstStatement ParseExpressionStatement(void) { AstExpressionStatement exprStmt; CREATE_AST_NODE(exprStmt, ExpressionStatement); if (CurrentToken != TK_SEMICOLON) { exprStmt->expr = ParseExpression(); } Expect(TK_SEMICOLON); return (AstStatement)exprStmt; }
If any statement is not, it can only be expression statement. In other words, the compiler will call the ParseExpression function for further parsing at this time.
4.8 summary
Of course, the purpose of any statement is to build an abstract syntax tree, that is, an abstract syntax tree. In fact, this process may encounter continuous nested processing, such as statement - > ifstatement - > statement - > forstatement - >, In this way, it has been called recursively. This part is allowed. After parsing, an abstract syntax tree can be constructed.
In general colleges and universities, the part of homework practice is basically over. But for the compiler, it just completes the front-end parsing.