[compilation principle] semantic analysis and intermediate code generation (C/C + + source code + experimental report)

Posted by mattfoster on Sat, 23 Oct 2021 10:02:00 +0200

1 Purpose and content of the experiment

1.1 experimental purpose

(1) Through computer practice, deepen the understanding of grammar guided translation principle, and master the semantic translation method of transforming the grammatical category identified by grammar analysis into some intermediate code.

(2) Master the commonly used semantic analysis method - grammar guided translation technology.

(3) The PL/0 grammar specification is given, which requires adding semantic processing in the syntax analyzer, and outputting the intermediate code for the expression with correct syntax; For arithmetic expressions with correct syntax, output their calculated values.

1.2 experimental contents

The PL/0 language grammar has been given. In the expression syntax analysis program of Experiment 2 or Experiment 3, the semantic processing part has been added to output the intermediate code of the expression, which is represented by quaternion sequence.

1.3 test requirements

(1) The semantic analysis object focuses on the correct grammatical category after grammatical analysis. The focus of this experiment is the semantic subroutine.

(2) Add the semantic processing of PL/0 language "expression" in Experiment 2 or Experiment 3 "parser", output the intermediate code of the expression, and calculate the semantic value of the expression.

(3) The intermediate code is represented by a quaternion sequence.

2 design idea

2.1 semantic rules

The process of attribute calculation is the process of semantic processing. Each production of grammar is equipped with a set of attribute calculation rules, which is called semantic rules.

(1) The terminator has only a comprehensive attribute, which is provided by the lexical analyzer.

(2) Non terminators can have either comprehensive attributes or inherited attributes. All inherited attributes of the grammar start symbol are used as the initial value before attribute calculation.

(3) A calculation rule must be provided for the inheritance attribute of the symbol on the right side of the production and the comprehensive attribute of the symbol on the left side of the production.

(4) The inheritance attribute of the symbol on the left of the production and the comprehensive attribute of the symbol on the right of the production are calculated by the attribute rules of other production.

2.2 recursive descent translator

The principle of recursive descent analysis is to use recursive calls between functions to simulate the top-down construction process of syntax tree. Starting from the root node, find a leftmost matching sequence in the input string from top to bottom, and establish a syntax tree. The inheritance property of each non terminator is regarded as a formal parameter, and the return value of the function is regarded as the inheritance property of the non terminator; For terminators, initialize all inherited properties. In the process of further analysis, the non terminator determines which production candidate to use according to the current input symbol.

2.3 pseudo code of recursive descent subroutine

(1) Expression

function expression:string;
string s1, s2, s3, result;
BEGIN
  IF SYM='+' OR SYM='-' THEN
  ADVANCE;
  ELSE IF SYM =FIRST(term) 
  ELSE ERROR;
  BEGIN s1:=term;
  END;
  WHILE SYM='+' OR SYM='-' THEN
  BEGIN
    ADVANCE;
    S2:=term;
    result := newtemp();
    emit(SYM,s1,s2,result);
    s1:= result;
  END;
  Return result;
END;

(2) Item

function term:string;
string s1, s2, s3, result;
BEGIN
  IF SYM =FIRST(factor) THEN
  BEGIN
    s1:=Factor;
  END;
  ELSE ERROR;
  WHILE SYM ='*'OR SYM='/' THEN
IF SYM =FIRST(factor) THEN 
  BEGIN 
    ADVANCE;
    S2:=Factor;
    result := newtemp();
    emit(SYM,s1,s2,result);
    s1:= result;
  END;
  Return result;
END;
ELSE ERROR;

(3) Factor

function factor:string;
string s;
BEGIN
  IF SYM ='(' THEN
  ADVANCE;
  s:=expression;
ADVANCE; 
  IF SYM=')' THEN
  ADVANCE;
Return s; 
ELSE ERROR; 
  ELSE IF SYM =FIRST(factor) THEN
ADVANCE;
  ELSE ERROR;
END;

3 algorithm flow

The flow chart of the algorithm is as follows: first, input the expression, then carry out lexical analysis, put the result of lexical analysis in the structure, then call the expression subprogram in the recursive descent parser to analyze it, and finally get the four tuple and have the corresponding structure. Next step is to judge, if it is arithmetic expression, calculate the value of the arithmetic expression and output it. If it is not an arithmetic expression, it will not be processed and the quad will be directly output. Finally, judge whether the input of the program is over. If it is not over, enter the expression again and repeat the above steps. If it is over, the program will exit.

Fig. 1 algorithm flow chart

4 source program

#include<iostream>
#include<stdlib.h>
#include<stdio.h>
#include<string.h>
using namespace std; 

//Store the results of lexical analysis
struct cf_tv
{
  string t;  //Types of lexical analysis
  string v;  //Value of lexical analysis variable
};

//Storage Quad
struct qua
{
  string symbal;  //Symbol
  string op_a;   //First operand
  string op_b;   //Second operand
  string result;  //result
};

string input; //Global input
int cnt;    //global variable
int k=0;    //tv input
int ljw=0;
cf_tv result[200]; //Storage results
qua output[200];  //Store output Quads
int x=0;      //Subscript of qua
int ans=0;     //Subscript when traversing
bool error=true;  //Error flag
int is_letter=0;
int t[1001];    //Temporary storage space
string item();
string factor();

//Generate new variable names t1,t2, etc
string new_temp()
{
  char *pq;
  char mm[18];
  pq=(char*)malloc(18);
  ljw++;
  //Convert to string format
  snprintf(mm,sizeof(mm),"%d",ljw);
  strcpy(pq+1,mm);
  pq[0]='t';
  string s;
  s=pq;
  return s;
}

//Determine whether it matches the target string
bool judge (string input, string s)
{
  if (input.length()!=s.length()) return false;
  else
  {
    for(unsigned int i=0;i<s.length();i++)
    {
      if(input[i]!=s[i]) return false;  //ergodic
    }
    return true;
  }
}

//Determine whether it matches the target string
bool judge1 (string input, string s)
{
  if(input[0]==s[0]) return true;
  else return false;
}

//Judgment of non symbolic procedures, including judgment keywords, identifiers, constants
void not_fh(string p)
{
  //Judge whether it is the same as the target string, and output the result if it is the same
  if(judge (p,"begin"))
     {
       result[k].t="beginsym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"call"))
     {
       result[k].t="callsym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"const"))
     {
       result[k].t="constsym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"do"))
     {
       result[k].t="dosym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"end"))
     {
       result[k].t="endsym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"if"))
     {
       result[k].t="ifsym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"odd"))
     {
       result[k].t="oddsym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"procedure"))
     {
       result[k].t="proceduresym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"read"))
     {
       result[k].t="readsym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"var"))
     {
       result[k].t="varsym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"then"))
     {
       result[k].t="thensym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"write"))
     {
       result[k].t="writesym";
       result[k].v=p;
       k++;
     }
     //Judge whether it is the same as the target string, and output the result if it is the same
     else if(judge (p,"while"))
     {
       result[k].t="whilesym";
       result[k].v=p;
       k++;
     }
     else
     {
       int flag = 0;
       for(unsigned int i=0;i<p.length();i++)
       {
         //Determine whether it is an identifier
         if(!isdigit(p[i]))
         {
           flag = 1;
           result[k].t="ident";
           result[k].v=p;
           k++;
           break;
         }
       }
       //Judge whether it is a number
       if(!flag)
       {
         result[k].t="number";
         result[k].v=p;
         k++;
       }
     }
}

//Prevent multiple operators from forming and return the correct subscript
int change(string str,int cnt)
{
  int y=0;
  char fh[15]={'+','-','*','/','=','<','>',':','(',')',',',';','.'};
  for(int i=0;i<13;i++)
  {
    if(str[cnt]==fh[i])
    {
      y=i;
    }
  }
  if(y==5)
  {
    //If the operator is composed of two symbols, cnt+1
    if(str[cnt+1]=='>')
    {
      cnt=cnt+1;
      return cnt;
    }
    //Judge whether two operators are connected
    else if(str[cnt+1]=='=')
    {
      cnt=cnt+1;
      return cnt;
    }
  }
  //Judgment:=
  if(y==7)
  {
    cnt=cnt+1;
    return cnt;
  }
  return cnt;
}

//Output to operators and delimiters
void fh_1(string str,int cnt)
{
  int y=0;
  char fh[15]={'+','-','*','/','=','<','>',':','(',')',',',';','.'};
  for(int i=0;i<13;i++)
  {
    if(str[cnt]==fh[i]) y=i;
  }
  //plus
  if(y==0)
  {
     result[k].t="plus";
     result[k].v=fh[y];
     k++;
  }
  //minus
  if(y==1)
  {
     result[k].t="minus";
     result[k].v=fh[y];
     k++;
  }
  //times
  if(y==2)
  {
     result[k].t="times";
     result[k].v=fh[y];
     k++;
  }
  //slash
  if(y==3)
  {
     result[k].t="slash";
     result[k].v=fh[y];
     k++;
  }
  //eql
  if(y==4)
  {
     result[k].t="eql";
     result[k].v=fh[y];
     k++;
  }
  if(y==5)
  {
    //neq
    if(str[cnt+1]=='>')
    {
      cnt=cnt+1;
      result[k].t="neq";
      result[k].v="<>";
      k++;
    }
    //leq
    else if(str[cnt+1]=='=')
    {
       result[k].t="leq";
       result[k].v="<=";
       k++;
    }
    //lss
    else
    {
       result[k].t="lss";
       result[k].v="<";
       k++;
    }
  }
  if(y==6)
  {
    //geq
    if(str[cnt+1]=='=')
    {
       result[k].t="geq";
       result[k].v=">=";
       k++;
    }
    //gtr
    else
    {
       result[k].t="gtr";
       result[k].v=">";
       k++;
    }
  }
  //becomes
  if(y==7)
  {
    result[k].t="becomes";
    result[k].v=":=";
    k++;
  }
  //lparen
  if(y==8)
  {
    result[k].t="lparen";
    result[k].v="(";
    k++;
  }
  //rparen
  if(y==9)
  {
    result[k].t="rparen";
    result[k].v=")";
    k++;
  }
  //comma
  if(y==10)
  {
    result[k].t="comma";
    result[k].v=",";
    k++;
  }
  //semicolon
  if(y==11)
  {
    result[k].t="semicolon";
    result[k].v=";";
    k++;
  }
  //period
  if(y==12)
  {
    result[k].t="period";
    result[k].v=".";
    k++;
  }
}

//lexical analysis 
void cifa()
{
  string str;
  while(cin>>str)
  {
    cnt=0;
    const char *d = " +-*/=<>:(),;.";
    char *p;
    //Use spaces and operators and delimiters to split strings and traverse
    char buf[1001] ;
    //Convert string to array
    strcpy(buf , str.c_str());
    //p is a char*
    p = strtok(buf,d);
    while(p)
    {
      //Current unsigned
      if(str[cnt]==p[0])
      {
         not_fh(p);
         cnt=cnt+strlen(p);
      }
      //Is currently a symbol
      else
      {
        while(str[cnt]!=p[0])
        {
          fh_1(str,cnt);
          cnt=change(str,cnt);
          cnt=cnt+1;
        }
        not_fh(p);
        cnt=cnt+strlen(p);
      }
      //Move down one bit for traversal
      p=strtok(NULL,d);
    }
    for(unsigned int i=cnt;i<str.length();i++)
    {
      //Prevent multiple symbols at the end
      fh_1(str,i);
    }
  }
}

//Determine what type of calculation it is
void judge_type()
{
  for(int i=0;i<k;i++)
  {
    if(judge(result[i].t,"ident"))
    {
      is_letter=1;
      break;
    }
  }
}

//Recursive descent analysis function of expression
string bds()
{
  string s;
  string s1,s2,s3;
  if(ans>k) return NULL;
  //Addition and subtraction symbol
  if(judge(result[ans].v,"+") || judge(result[ans].v,"-"))
  {
    ans++;
    if(ans>k)
    {
      cout<<1<<endl;
      //error
      error=false;
    }
    s1=item();
  }
  else if( judge(result[ans].v,"(") ||judge(result[ans].t,"ident") ||judge(result[ans].t,"number"))
  {
    //Item determination, the preceding condition is the first set
    s1=item();
  }
  else
  {
    cout<<2<<endl;
    //error
    error=false;
  }//
  while(judge(result[ans].v,"+") || judge(result[ans].v,"-"))
  {
    int ans_temp=ans;
    ans++;
    if(ans>k)
    {
      cout<<3<<endl;
      //error
      error=false;
    }
    //Project cycle
    s2=item();
    output[x].symbal=result[ans_temp].v;
    output[x].op_a=s1;
    output[x].op_b=s2;
    output[x].result=new_temp();
    s=output[x].result;
    s1=s;
    x++;
  }
  return s;
}

//Recursive descent analysis function of term
string item()
{
  string s;
  string s1,s2,s3;
  if(ans>k) return NULL;
  //Factor judgment
  s1=factor();
  while(judge(result[ans].v,"*") || judge(result[ans].v,"/"))
  {
    int ans_temp=ans;
    ans++;
    if(ans>k)
    {
      cout<<4<<endl;
      //error
      error=false;
    }
    s2=factor();
    output[x].op_a=s1;
    output[x].symbal=result[ans_temp].v;
    output[x].op_b=s2;
    output[x].result=new_temp();
    s=output[x].result;
    s1=s;
    x++;
  }
  return s1;
}

//Recursive descent analysis function of factor
string factor()
{
  string s;
  if(ans>=k) return NULL;
  //First letter or number
  if(judge(result[ans].t,"ident") ||judge(result[ans].t,"number"))
  {
    s=result[ans].v;
    ans++;
    if(ans>k)
    {
      cout<<5<<endl;
      //error
      error=false;
    }
  }
  //Left parenthesis
  else if(judge(result[ans].v,"("))
  {
    ans++;
    //expression
    s = bds();
    //Right parenthesis
    if(judge(result[ans].v,")"))
    {
     ans++;
     if(ans>k)
     {
       cout<<6<<endl;
       //error
       error=false;
     }
    }
  }
  else
  {
    cout<<7<<endl;
    //error
    error=false;
  }
  return s;
}

//Delete first letter
string del(string s)
{
  char c[101];
  for(unsigned int i=0;i<s.length()-1;i++)
  {
    c[i]=s[i+1];
  }
  return c;
}

void js(int i)
{
  char* end;
  //If it's multiplication
  if(judge(output[i].symbal,"*"))
  {
    //Determine whether the first symbol is a letter or a number
    if(!judge1(output[i].op_a,"t"))
    {
      if(!judge1(output[i].op_b,"t"))
      {
        //Cast type
        t[i+1]=static_cast<int>(strtol(output[i].op_a.c_str(),&end,10))*static_cast<int>(strtol(output[i].op_b.c_str(),&end,10));
      }
    }
  }
  else
  {
    if(!judge1(output[i].op_b,"t"))
    {
      string ss;
      ss=del(output[i].op_a);
      //Cast type
      int z=static_cast<int>(strtol(ss.c_str(),&end,10));
      t[i+1]=t[z]*static_cast<int>(strtol(output[i].op_b.c_str(),&end,10));
    }
    else
    {
      string s;
      s=del(output[i].op_a);
      int yy=static_cast<int>(strtol(s.c_str(),&end,10));
      string ss;
      ss=del(output[i].op_b);
      int zz=static_cast<int>(strtol(ss.c_str(),&end,10));
      t[i+1]=t[yy]*t[zz];
    }
  if(judge(output[i].symbal,"+"))
  {
    if(!judge1(output[i].op_a,"t"))
    {
      if(!judge1(output[i].op_b,"t"))
      {
        t[i+1]=static_cast<int>(strtol(output[i].op_a.c_str(),&end,10))+static_cast<int>(strtol(output[i].op_b.c_str(),&end,10));
      }
      else
      {
        string ss;
        ss=del(output[i].op_b);
        int yy=static_cast<int>(strtol(output[i].op_a.c_str(),&end,10));
        int zz=static_cast<int>(strtol(ss.c_str(),&end,10));
        t[i+1]=yy+t[zz];
      }
    }
    else
    {
      if(!judge1(output[i].op_b,"t"))
      {
        string ss;
        ss=del(output[i].op_a);
        int zz=static_cast<int>(strtol(ss.c_str(),&end,10));
        t[i+1]=t[zz]+static_cast<int>(strtol(output[i].op_b.c_str(),&end,10));
      }
      else
      {
        string s;
        s=del(output[i].op_a);
        int yy=static_cast<int>(strtol(s.c_str(),&end,10));
        string ss;
        ss=del(output[i].op_b);
        int zz=static_cast<int>(strtol(ss.c_str(),&end,10));
        t[i+1]=t[yy]+t[zz];
      }
    }
  }
 }
}

int main()
{
  //Lexical analysis function
  cifa();
  //Judgment type
  judge_type();
  //Syntax analysis and semantic analysis
  bds();
  //Output
  if(is_letter==1)
  {
     for(int i=0;i<x;i++)
    {
      cout<<"("<<output[i].symbal<<","<<output[i].op_a<<","<<output[i].op_b<<","<<output[i].result<<")"<<endl;
    }
  }
  //Output and calculate the results
  else
  {
    for(int i=0;i<x;i++)
    {
      js(i);
    }
    cout<<t[x]<<endl;
  }
  return 0;
}

5 commissioning data

5.1 test example I

[[sample input]
2+3*5

[[sample output]
17

The results of example 1 are as follows

Fig. 2 test results of sample 1

5.2 test example II

[[sample input]
2+3*5+7

[[sample output]
24

The results of example 2 are as follows:

Fig. 3 test results of sample 2

5.3 test example III

[[sample input]
a*(b+c) 

[[sample output]
(+,b,c,t1)
(*,a,t1,t2)

The results of example 3 are as follows

Fig. 4 test results of sample 3

5.4 test example IV

[[sample input]
a*(b+c)+d

[[sample output]
(+,b,c,t1)
(*,a,t1,t2)
(+,t2,d,t3)

The results of example 4 are as follows


Fig. 5 test results of example 4

6. Experimental debugging and experience

6.1 experimental commissioning

From the four test samples in the previous step, all test samples have obtained corresponding output results, indicating that the code is written successfully, and error handling is set in the code to solve other situations.

6.2 experimental experience

This experiment was previewed in time before class. Before writing the code, you need to write the pseudo code of the recursive descent translator. The key is to find out which attributes of each non terminator are inheritance attributes and which are comprehensive attributes. Then, the inherited attribute is used as the parameter and the comprehensive attribute is used as the return value for calculation.

When writing the code, you need to use the code of Experiment 1 and Experiment 2. When writing the code of Experiment 1, you do not consider that it will be used later. You directly output the results without saving the intermediate results, so that you need to store the results of Experiment 1 in a user-defined structure, which contains two factors of lexical analysis: value and type. When the analyzer analyzes, it directly calls the contents of this structure, and the results of the quaternion will also be placed in a special structure, in which the four values of the quaternion are recorded for easy output. If it is a digital expression, the analog calculator can calculate these four values, and the array and decision operator function are needed to judge whether it is a number or an auxiliary variable, and the operation is carried out according to the corresponding symbols.

Through this experiment, we have a general review of the knowledge points from lexical analysis to grammatical analysis to semantic analysis, and focus on what is input and output in each stage, how to store these information, and what algorithm to calculate. We also need to further optimize our own code. For example, in the process of this experimental code, what needs to be improved is to combine lexical analysis and syntax analysis to reduce time complexity and improve execution efficiency.

Through these four experiments, I have a clear understanding of the course of compilation principle. Maybe I understood the theory course at that time and might forget it in a while. By learning the compilation principle, I feel that I use the thinking understanding of data structure and algorithm, and need to understand and remember many concepts. This is also the difficulty of this course. Through this study, I understand that we should pay more attention to the mastery of basic subjects, and constantly strengthen and expand our computer thinking.

Finally, I would like to thank Mr. Liu Shanmei for his careful guidance for me for a semester. I will continue to strive to learn every professional course in the future and live up to the teacher's high expectations!

Topics: C++