Instruction Analysis of Lua5.3 Virtual Machine (2) Assignment Instruction

Posted by wyrd33 on Sun, 02 Jun 2019 23:39:57 +0200

Instruction Analysis of Lua5.3 Virtual Machine (2) Assignment Instruction

Lua VM is implemented based on register structure, that is to say, every Lua chunk code is translated into a set of 256 register operation instructions. This is a bit like writing C extensions for Lua.
C functions usually take parameters from lua_State and record them one by one in the local variables of C, then use C code to operate on these values directly.

Registers can be analogized to Lua's registers. They do have similarities. Local variables in C are on the C stack, while Lua's registers are on the Lua's data stack.

The process of assigning Lua local variables is completed by OP_MOVE, OP_LOADK, OP_LOADKX, OP_LOADBOOL, OP_LOADNIL, OP_GETUPVAL, OP_SETUPVAL.

There are three sources of merit:

First: other registers, i.e. local variables. OP_MOVE can do this.

OP_MOVE A B R(A) := R(B)

OP_MOVE is used to copy the values in register B to register A.
Since Lua is register-based vm, most instructions operate directly on registers, without the need for data stacks and bullet stacks, there are not many places where OP_MOVE instructions are required.
The most direct use is to copy a local variable to another local variable:

TTcs-Mac-mini:OpCode ttc$ cat tOP_MOVE.lua 
local a
local b = a
TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_MOVE.lua

main <tOP_MOVE.lua:0,0> (3 instructions at 0x7f905ac039b0)
0+ params, 2 slots, 1 upvalue, 2 locals, 0 constants, 0 functions
    1   [1] LOADNIL     (iABC) [A]0 [ISK]0[B]0[ISK]0
    2   [2] MOVE        (iABC) [A]1 [ISK]0[B]0[ISK]0
    3   [2] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (0) for 0x7f905ac039b0:
locals (2) for 0x7f905ac039b0:
    0   a(name)      2(startpc)     4(endpc)
    1   b(name)      3(startpc)     4(endpc)
upvalues (1) for 0x7f905ac039b0:
    0    _ENV(name)      1(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$ 

During compilation, Lua assigns each local variable to a specified register. During runtime, Lua uses the register id corresponding to the local variable to manipulate the local variable, and the name of the local variable has no other effect than providing debug information.

Here A is assigned to register 0 and b to register 1. MOVE means assigning the value of a(register 0) to b(register 1). Other places used are basically places with special requirements for the location of registers, such as the transfer of function parameters and so on.

Second: Constants, nil and bool types of data are relatively short and can be loaded directly through instructions (no need to load constants to registers first).

OP_LOADBOOL A B C R(A) := (Bool)B; if (C) pc++

LOADBOOL loads the boolean value represented by B into register A. B uses 0 and 1 to represent false and true, respectively. C also represents a boolean value, and if C is 1, the next instruction is skipped.

TTcs-Mac-mini:OpCode ttc$ cat tOP_LOADBOOL.lua 
local a = true


TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_LOADBOOL.lua

main <tOP_LOADBOOL.lua:0,0> (2 instructions at 0x7ff1754039d0)
0+ params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
    1   [1] LOADBOOL    (iABC) [A]0 [ISK]0[B]1[ISK]0[C]0
    2   [1] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (0) for 0x7ff1754039d0:
locals (1) for 0x7ff1754039d0:
    0   a(name)      2(startpc)     3(endpc)
upvalues (1) for 0x7ff1754039d0:
    0    _ENV(name)      1(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$ 


TTcs-Mac-mini:OpCode ttc$ cat tOP_LOADBOOL_2.lua 

local b = 1 < 2

TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_LOADBOOL_2.lua

main <tOP_LOADBOOL_2.lua:0,0> (5 instructions at 0x7fda7d4039d0)
0+ params, 2 slots, 1 upvalue, 1 local, 2 constants, 0 functions
    1   [2] LT          (iABC) [A]1 [ISK]256[B]-1[ISK]256[C]-2  ; 1 2
    2   [2] JMP         (iAsBx) [A]0 [sBx]1 ; to 4
    3   [2] LOADBOOL    (iABC) [A]0 [ISK]0[B]0[ISK]0[C]1
    4   [2] LOADBOOL    (iABC) [A]0 [ISK]0[B]1[ISK]0[C]0
    5   [2] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (2) for 0x7fda7d4039d0:
    1(idx)  1
    2(idx)  2
locals (1) for 0x7fda7d4039d0:
    0   b(name)      5(startpc)     6(endpc)
upvalues (1) for 0x7fda7d4039d0:
    0    _ENV(name)      1(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$ 

As you can see, LT and JMP instructions are generated. LT instructions themselves do not produce a boolean value, but cooperate with JMP to realize the jump of true and false. If LT is true, continue executing (to the JMP instruction), then jump to the index corresponding to JMP; otherwise, skip the next instruction to assign false to b and skip the next instruction.
In fact, the code described here is equivalent to the following code.

 TTcs-Mac-mini:OpCode ttc$ cat tOP_LOADBOOL_3.lua 
local a;
if 1 < 2 then
    a = true;
else
    a = false;
end
TTcs-Mac-mini:OpCode ttc$ 

OP_LOADNIL A B  R(A), R(A+1), ..., R(A+B) := nil

LOADNIL assigns nil to registers representing ranges A to B. Scope representation registers are mainly used to optimize the following situations:

TTcs-Mac-mini:OpCode ttc$ cat tOP_LOADNIL_2.lua 
local a,b,c,d,e,f

TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_LOADNIL_2.lua 

main <tOP_LOADNIL_2.lua:0,0> (2 instructions at 0x7ff816c039d0)
0+ params, 6 slots, 1 upvalue, 6 locals, 0 constants, 0 functions
    1   [1] LOADNIL     (iABC) [A]0 [ISK]0[B]5[ISK]0
    2   [1] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (0) for 0x7ff816c039d0:
locals (6) for 0x7ff816c039d0:
    0   a(name)      2(startpc)     3(endpc)
    1   b(name)      2(startpc)     3(endpc)
    2   c(name)      2(startpc)     3(endpc)
    3   d(name)      2(startpc)     3(endpc)
    4   e(name)      2(startpc)     3(endpc)
    5   f(name)      2(startpc)     3(endpc)
upvalues (1) for 0x7ff816c039d0:
    0    _ENV(name)      1(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$ 

For continuous local variables, a LOADNIL instruction can be used. Conversely, the following figure

TTcs-Mac-mini:OpCode ttc$ cat tOP_LOADNIL.lua 

local a;
local b = 10;
local c;
TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_LOADNIL.lua 

main <tOP_LOADNIL.lua:0,0> (4 instructions at 0x7f8e3bc039c0)
0+ params, 3 slots, 1 upvalue, 3 locals, 1 constant, 0 functions
    1   [2] LOADNIL     (iABC) [A]0 [ISK]0[B]0[ISK]0
    2   [3] LOADK       (iABx) [A]1 [K]-1   ; 10
    3   [4] LOADNIL     (iABC) [A]2 [ISK]0[B]0[ISK]0
    4   [4] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (1) for 0x7f8e3bc039c0:
    1(idx)  10
locals (3) for 0x7f8e3bc039c0:
    0   a(name)      2(startpc)     5(endpc)
    1   b(name)      3(startpc)     5(endpc)
    2   c(name)      4(startpc)     5(endpc)
upvalues (1) for 0x7f8e3bc039c0:
    0    _ENV(name)      1(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$ 

OP_LOADK A Bx R(A) := Kst(Bx)

For constants such as numbers or strings, it is impossible to encode values directly into instructions. So Proto uses a constant to store each function prototype, and only needs to use an index to refer to these constants.

TTcs-Mac-mini:OpCode ttc$ cat tOP_LOADK.lua 
local a = 1
local b = "TTc"
TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_LOADK.lua

main <tOP_LOADK.lua:0,0> (3 instructions at 0x7fce38c039b0)
0+ params, 2 slots, 1 upvalue, 2 locals, 2 constants, 0 functions
    1   [1] LOADK       (iABx) [A]0 [K]-1   ; 1
    2   [2] LOADK       (iABx) [A]1 [K]-2   ; "TTc"
    3   [2] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (2) for 0x7fce38c039b0:
    1(idx)  1
    2(idx)  "TTc"
locals (2) for 0x7fce38c039b0:
    0   a(name)      2(startpc)     4(endpc)
    1   b(name)      3(startpc)     4(endpc)
upvalues (1) for 0x7fce38c039b0:
    0    _ENV(name)      1(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$

LOADK loads the constants in the constants represented by Bx into register A. If the constant is too large (2 ^ 18) to exceed the expression range of Bx, a LOADKX instruction is generated instead of the LOADK instruction, and then an EXTRAARG instruction is generated immediately, and its Ax is used to store the idx.

OP_LOADKX A  R(A) := Kst(extra arg)

Third: data that is not a constant and no longer exists in registers (such data refers to: upvalue or values in table s)

GETUPVAL and SETUPVAL can read and write the current upvalue.
GETTABUP and SETTABUP can read and write the entries of the table referred to by upvalue, and parameter B is indexed by the upvalue number.
The tables that GETTABLE and SETTABLE can manipulate are no longer in upvalue, but in registers, where parameter B indexes the register number.
At compile time, if variable a is to be accessed, the type of variable a is determined in the following order:
1. a is the local variable of the current function
2. a is the local variable of the outer function, so a is the upvalue of the current function.
3. a is a global variable

The local variable itself exists in the current register, and all instructions can be accessed directly with its id. For upvalue, lua has special instructions for acquisition and setup.

At present, global variables are only a reference to a special upvalue_ENV. In fact, a large number of Lua codes will directly refer to the variables in _ENV. Designing an independent operation code for this access mode is conducive to the compactness of bytecode and the improvement of performance.

OP_GETUPVAL A B R(A) := UpValue[B]

TTcs-Mac-mini:OpCode ttc$ cat tOP_GETUPVAL.lua 
local u = 100
function f()
    local l = u
end
TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_GETUPVAL.lua

main <tOP_GETUPVAL.lua:0,0> (4 instructions at 0x7ffc0a4039d0)
0+ params, 2 slots, 1 upvalue, 1 local, 2 constants, 1 function
    1   [1] LOADK       (iABx) [A]0 [K]-1   ; 100
    2   [4] CLOSURE     (iABx) [A]1 [U]0    ; 0x7ffc0a403b90
    3   [2] SETTABUP    (iABC) [A]0 [ISK]256[B]-2[ISK]0[C]1 ; _ENV "f"
    4   [4] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (2) for 0x7ffc0a4039d0:
    1(idx)  100
    2(idx)  "f"
locals (1) for 0x7ffc0a4039d0:
    0   u(name)      2(startpc)     5(endpc)
upvalues (1) for 0x7ffc0a4039d0:
    0    _ENV(name)      1(instack)      0(idx)

function <tOP_GETUPVAL.lua:2,4> (2 instructions at 0x7ffc0a403b90)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
    1   [3] GETUPVAL    (iABC) [A]0 [ISK]0[B]0[ISK]0    ; u
    2   [4] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (0) for 0x7ffc0a403b90:
locals (1) for 0x7ffc0a403b90:
    0   l(name)      2(startpc)     3(endpc)
upvalues (1) for 0x7ffc0a403b90:
    0    u(name)     1(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$ 

GETUPVAL loads the value of upvalue indexed by B into register A.

The value (u) pointed by the upvalue index idx (upvalues - > 0) represented by B is assigned to the local variable (l) represented by the idx (register - > 0) of register A.

OP_SETUPVAL A B UpValue[B] := R(A)

TTcs-Mac-mini:OpCode ttc$ cat tOP_SETUPVAL.lua 
local u = 19
function f()
    local l
    u = 111
end
TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_SETUPVAL.lua

main <tOP_SETUPVAL.lua:0,0> (4 instructions at 0x7fcfa04039d0)
0+ params, 2 slots, 1 upvalue, 1 local, 2 constants, 1 function
    1   [1] LOADK       (iABx) [A]0 [K]-1   ; 19
    2   [5] CLOSURE     (iABx) [A]1 [U]0    ; 0x7fcfa0403b90
    3   [2] SETTABUP    (iABC) [A]0 [ISK]256[B]-2[ISK]0[C]1 ; _ENV "f"
    4   [5] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (2) for 0x7fcfa04039d0:
    1(idx)  19
    2(idx)  "f"
locals (1) for 0x7fcfa04039d0:
    0   u(name)      2(startpc)     5(endpc)
upvalues (1) for 0x7fcfa04039d0:
    0    _ENV(name)      1(instack)      0(idx)

function <tOP_SETUPVAL.lua:2,5> (4 instructions at 0x7fcfa0403b90)
0 params, 2 slots, 1 upvalue, 1 local, 1 constant, 0 functions
    1   [3] LOADNIL     (iABC) [A]0 [ISK]0[B]0[ISK]0
    2   [4] LOADK       (iABx) [A]1 [K]-1   ; 111
    3   [4] SETUPVAL    (iABC) [A]1 [ISK]0[B]0[ISK]0    ; u
    4   [5] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (1) for 0x7fcfa0403b90:
    1(idx)  111
locals (1) for 0x7fcfa0403b90:
    0   l(name)      2(startpc)     5(endpc)
upvalues (1) for 0x7fcfa0403b90:
    0    u(name)     1(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$

111 is a constant, which exists in the constant. Lua does not have the direct operation instructions between the constant and upvalue, so it must first load the constant 111 into the temporary register represented by parameter A (register - > 1) using the LOADK instruction, and then SETUPVAL assigns the value of register - > 1 to the upvalue index idx (upvalue - > 0) represented by parameter B (u).

OP_GETTABUP A B C R(A) := UpValue[B][RK(C)]

GETTABUP treats the upvalue of B as a table and C as a register (or constant) of the index as a key to get the value into register A.

 TTcs-Mac-mini:OpCode ttc$ cat tOP_GETTABUP.lua 

g = 222
function f()
    local a = g
end
TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_GETTABUP.lua

main <tOP_GETTABUP.lua:0,0> (4 instructions at 0x7f8cf0c039d0)
0+ params, 2 slots, 1 upvalue, 0 locals, 3 constants, 1 function
    1   [2] SETTABUP    (iABC) [A]0 [ISK]256[B]-1[ISK]256[C]-2  ; _ENV "g" 222
    2   [5] CLOSURE     (iABx) [A]0 [U]0    ; 0x7f8cf0c03c50
    3   [3] SETTABUP    (iABC) [A]0 [ISK]256[B]-3[ISK]0[C]0 ; _ENV "f"
    4   [5] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (3) for 0x7f8cf0c039d0:
    1(idx)  "g"
    2(idx)  222
    3(idx)  "f"
locals (0) for 0x7f8cf0c039d0:
upvalues (1) for 0x7f8cf0c039d0:
    0    _ENV(name)      1(instack)      0(idx)

function <tOP_GETTABUP.lua:3,5> (2 instructions at 0x7f8cf0c03c50)
0 params, 2 slots, 1 upvalue, 1 local, 1 constant, 0 functions
    1   [4] GETTABUP    (iABC) [A]0 [ISK]0[B]0[ISK]256[C]-1 ; _ENV "g"
    2   [5] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (1) for 0x7f8cf0c03c50:
    1(idx)  "g"
locals (1) for 0x7f8cf0c03c50:
    0   a(name)      2(startpc)     3(endpc)
upvalues (1) for 0x7f8cf0c03c50:
    0    _ENV(name)      0(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$

The parameter A is register - > 0, which is the local variable a.
The upvalue of ISK = 0B as the register index - > 0 is a table (_ENV);
ISK = 256C is constant constants - > 1 is constant string "g";

OP_SETTABUP A B C UpValue[A][RK(B)] := RK(C)

SETTABUP treats the upvalue indexed by A as a table, and the value of the register (or constant) represented by C as a B register or constant as a key.

TTcs-Mac-mini:OpCode ttc$ cat tOP_SETTABUP.lua 
function f()
    g = 1111
end
TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_SETTABUP.lua

main <tOP_SETTABUP.lua:0,0> (3 instructions at 0x7fa64fc039d0)
0+ params, 2 slots, 1 upvalue, 0 locals, 1 constant, 1 function
    1   [3] CLOSURE     (iABx) [A]0 [U]0    ; 0x7fa64fc03b80
    2   [1] SETTABUP    (iABC) [A]0 [ISK]256[B]-1[ISK]0[C]0 ; _ENV "f"
    3   [3] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (1) for 0x7fa64fc039d0:
    1(idx)  "f"
locals (0) for 0x7fa64fc039d0:
upvalues (1) for 0x7fa64fc039d0:
    0    _ENV(name)      1(instack)      0(idx)

function <tOP_SETTABUP.lua:1,3> (2 instructions at 0x7fa64fc03b80)
0 params, 2 slots, 1 upvalue, 0 locals, 2 constants, 0 functions
    1   [2] SETTABUP    (iABC) [A]0 [ISK]256[B]-1[ISK]256[C]-2  ; _ENV "g" 1111
    2   [3] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (2) for 0x7fa64fc03b80:
    1(idx)  "g"
    2(idx)  1111
locals (0) for 0x7fa64fc03b80:
upvalues (1) for 0x7fa64fc03b80:
    0    _ENV(name)      0(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$

Parametric A refers to the value of Upvalues - > 0, which is the _ENV table.
ISK = 256 parameter B is the string constant "g" at constant constants - > 1
The constant value of ISK = 256 parameter C at constant constants - > 2 is 1111.

OP_GETTABLE A B C R(A) := R(B)[RK(C)]

GETTABLE uses the key represented in C to get the table item values in register B into register A.

TTcs-Mac-mini:OpCode ttc$ cat tOP_GETTABLE.lua 
local t = {}
t.x=1
local b = t.x
TTcs-Mac-mini:OpCode ttc$ ./luac -l -l tOP_GETTABLE.lua

main <tOP_GETTABLE.lua:0,0> (4 instructions at 0x7fb3a3c039d0)
0+ params, 2 slots, 1 upvalue, 2 locals, 2 constants, 0 functions
    1   [1] NEWTABLE    (iABC) [A]0 [ISK]0[B]0[ISK]0[C]0
    2   [2] SETTABLE    (iABC) [A]0 [ISK]256[B]-1[ISK]256[C]-2  ; "x" 1
    3   [3] GETTABLE    (iABC) [A]1 [ISK]0[B]0[ISK]256[C]-1 ; "x"
    4   [3] RETURN      (iABC) [A]0 [ISK]0[B]1[ISK]0
constants (2) for 0x7fb3a3c039d0:
    1(idx)  "x"
    2(idx)  1
locals (2) for 0x7fb3a3c039d0:
    0   t(name)      2(startpc)     5(endpc)
    1   b(name)      4(startpc)     5(endpc)
upvalues (1) for 0x7fb3a3c039d0:
    0    _ENV(name)      1(instack)      0(idx)
TTcs-Mac-mini:OpCode ttc$

The parameter A is register - > 1 is a local variable b.
ISK = 0 parameter B is register - > 0 is table t.
ISK = 256 parameter C is constant constants - > 1 is constant string "x".

OP_SETTABLE A B C R(A)[RK(B)] := RK(C)

SETTABLE sets item B of the table in register A as the value represented by C.

The parameter A is register - > 0 is table t.
ISK = 256 parameter B is constant constants - > 1 is constant string "x";
ISK = 256 indicates that parameter C is constant constants - > 2 is constant value 1.

OP_NEWTABLE A B C R(A) := {} (size = B,C)

NEWTABLE creates a table object at register A. B and C are used to store the initial size of the table array part and hash part, respectively.

Topics: Mac