Implement JavaScript language interpreter

Posted by ppgpilot on Tue, 08 Mar 2022 07:04:21 +0100

preface

Last article I introduced some basic concepts of syntax parsing and how to realize the syntax tree parsing of Simple language interpreter through custom DSL language. In this and the last article in this series, I will introduce you how the Simple interpreter executes the generated syntax tree.

evaluate function and scope

The evaluate function appeared before when introducing the knowledge related to syntax parsing. In fact, basically every AST node will have a corresponding evaluate function. The function is to tell the Simple interpreter how to execute the current AST node. Therefore, the process of executing code by the Simple interpreter is to execute the evaluate function of the current node from the root node, and then recursively execute the evaluate function of the child node.

We know that when JavaScript code is executed, there is a concept called scope. When we access a variable, we will first check whether the variable is defined in the current scope. If not, we will look up the scope chain to the global scope, If there is no definition of this variable on the scope chain, an Uncaught ReferenceError: xx is not defined error will be thrown. When implementing the Simple language interpreter, I implemented a class called Environment by referring to the concept of JavaScript scope. Let's take a look at the implementation of Evironment class:

// lib/runtime/Environment.ts

// The Environment class is the scope of the Simple language
class Environment {
  // Parent points to the parent scope of the current scope
  private parent: Environment = null
  // The values object will store the reference and value of the current scope variable in the form of key value
  // For example, values = {a: 10}, which means that there is a variable a in the current scope, and its value is 10
  protected values: Object = {}

  // When there is a new variable definition in the current scope, the create function will be called to set the value
  // For example, when let a = 10 is executed, env is called create('a', 10)
  create(key: string, value: any) {
    if(this.values.hasOwnProperty(key)) {
      throw new Error(`${key} has been initialized`)
    }
    this.values[key] = value
  }

  // If a variable is reassigned, Simple will search along the current scope chain, find the nearest qualified scope, and then reassign it on the scope
  update(key: string, value: any) {
    const matchedEnvironment = this.getEnvironmentWithKey(key)
    if (!matchedEnvironment) {
      throw new Error(`Uncaught ReferenceError: ${key} hasn't been defined`)
    }
    matchedEnvironment.values = {
      ...matchedEnvironment.values,
      [key]: value
    }
  }

  // Look for a variable in the scope chain. If it is not found, an Uncaught ReferenceError error error will be thrown
  get(key: string) {
    const matchedEnvironment = this.getEnvironmentWithKey(key)
    if (!matchedEnvironment) {
      throw new Error(`Uncaught ReferenceError: ${key} is not defined`)
    }

    return matchedEnvironment.values[key]
  }

  // Look up the value of a variable along the scope chain, and return null if it is not found
  private getEnvironmentWithKey(key: string): Environment {
    if(this.values.hasOwnProperty(key)) {
      return this
    }
  
    let currentEnvironment = this.parent
    while(currentEnvironment) {
      if (currentEnvironment.values.hasOwnProperty(key)) {
        return currentEnvironment
      }
      currentEnvironment = currentEnvironment.parent
    }

    return null
  }
}

From the above code and comments, we can see that the so-called scope chain is actually a one-way linked list composed of Environment instances. When parsing the value of a variable, it will look along the scope chain. If the definition of the variable is not found, an error will be reported. Next, let's take the process of executing the for loop to see what the specific process is like:

Code executed:

for(let i = 0; i < 10; i++) {
  console.log(i);
};

Execution process of ForStatement Code:

// lib/ast/node/ForStatement.ts
class ForStatement extends Node {
  ...

  // The evaluate function will accept a scope object that represents the execution scope of the current AST node
  evaluate(env: Environment): any {
    // The contents in the above for loop brackets are in an independent scope, so you need to create a new scope named bridgeEnvironment based on the scope passed by the parent node
    const bridgeEnvironment = new Environment(env)
    // if variables in parentheses (let i = 0) will be initialized in this scope
    this.init.evaluate(bridgeEnvironment)

    // If the current scope is not exited by the break statement & & return statement returns that the & & test expression (I < 10) is true, the for loop will continue to execute, otherwise the for loop will be interrupted
    while(!runtime.isBreak && !runtime.isReturn && this.test.evaluate(bridgeEnvironment)) {
      // Because the for loop body (console.log(i)) is a new scope, a new sub scope should be created based on the current bridgeenvironment
      const executionEnvironment = new Environment(bridgeEnvironment)
      this.body.evaluate(executionEnvironment)
      // The update of loop variables (i + +) will be executed in brigeEnvironment
      this.update.evaluate(bridgeEnvironment)
    }
  }
}

Closure and this binding

After understanding the general execution process of the evaluate function, let's take a look at how closures are implemented. We all know that JavaScript is a lexical scope, that is, the scope chain of a function is determined when the function is defined. Let's see how the closure of Simple language is implemented through the code of the evaluate function of the function declaration node function declaration:

// lib/ast/node/FunctionDeclaration.ts
class FunctionDeclaration extends Node {
  ...

  // When the function declaration statement is executed, the evaluate function will be executed, and the object passed in is the current execution scope
  evaluate(env: Environment): any {
    // Generate a new FunctionDeclaration object because the same function may be defined multiple times (for example, when the function is nested in a parent function)
    const func = new FunctionDeclaration()
    // Function copy
    func.loc = this.loc
    func.id = this.id
    func.params = [...this.params]
    func.body = this.body
    
    // When the function is declared, the current execution scope will be recorded through the parentEnv attribute, which is the closure!!!
    func.parentEnv = env

    // Register the function to the current execution scope, and the function can be called recursively
    env.create(this.id.name, func)
  }
  ...
}

As can be seen from the above code, to realize the closure of Simple language, you only need to record the current scope (parentEnv) when the function is declared.

Next, let's take a look at how to judge which object this binds when the function is executed:

// lib/ast/node/FunctionDeclaration.ts
class FunctionDeclaration extends Node {
  ...

  // When the function is executed, if there is an instance calling the function, the instance will be passed in as a parameter. For example, a.test(), a is the parameter of test
  call(args: Array<any>, callerInstance?: any): any {
    // If the parameters passed in during function execution are less than the declared parameters, an error will be reported
    if (this.params.length !== args.length) {
      throw new Error('function declared parameters are not matched with arguments')
    }

    // This is the key point of realizing closure. The parent scope of function execution is the parent scope recorded when the function is defined!!
    const callEnvironment = new Environment(this.parentEnv)
    
    // Initialize function parameters
    for (let i = 0; i < args.length; i++) {
      const argument = args[i]
      const param = this.params[i]

      callEnvironment.create(param.name, argument)
    }
    // Create the arguments object of the function
    callEnvironment.create('arguments', args)

    // If the current function has a calling instance, this of the function will be the calling instance
    if (callerInstance) {
      callEnvironment.create('this', callerInstance)
    } else {
      // If the function does not have a calling instance, it will look along the scope chain of the function until the global process(node) or window(browser) object
      callEnvironment.create('this', this.parentEnv.getRootEnv().get('process'))
    }

    // Execution of function body
    this.body.evaluate(callEnvironment)
  }
}

The above code probably introduces you how this is bound in Simple language. In fact, the implementation of JavaScript may be quite different from this. Here is just a reference for you.

summary

In this article, I introduce how the Simple interpreter executes code, including closures and this binding. Due to space constraints, many contents are ignored here, such as how the break statement of the for and while loop exits, and how the return statement of the function passes the value to the parent function. If you are interested, you can take a look at my source code:
https://github.com/XiaocongDo...

Finally, I hope that through the study of these three series of articles, you can have a certain understanding of the compilation principle and some difficult language features of JavaScript. I also hope that I can continue to bring you high-quality content to make progress together.

Personal technology trends

The article started with my Blog platform

Welcome to the official account for the growth of scallions.

Topics: Javascript node.js TypeScript