Chapter 26 (Concurrency: Introduction) of Operating Systems: Three Easy Pieces

Posted by kristian_gl on Mon, 06 Sep 2021 03:37:40 +0200

Introduction to operating system : Operating Systems: Three Easy Pieces

After class exercises: https://pages.cs.wisc.edu/~remzi/OSTEP/Homework/homework.html

The translation of the README part of the exercises after class in this chapter (easy to view later):

Welcome to this simulator. The idea is to get familiar with threads by observing how they interleave; simulator x86.py Will help you understand this.

The simulator simulates the execution of assembly sequence code through multiple threads. Note that the OS code to be run (for example, performing context switching) is not displayed; Therefore, all you see is the interleaving of user code.

The operation of assembly code is based on x86, but it is somewhat simplified. In these instruction sets, here are four general registers (% ax,% BX,% CX,% DX), a program pointer (PC) and a small instruction set, which are enough for us.

Here is a short example code that can be run:

.main
mov 2000, %ax   # get the value at the address
add $1, %ax     # increment it
mov %ax, 2000   # store it back
halt

This code is easy to understand: the first instruction, an x86 "mov", loads a value from address 2000 into the% ax register. In this subset of X86, addresses can take the following forms:

  two thousand                 -> number  ( 2000) is an address
  (% cx)                -> The address consists of the contents of the register (in parentheses)
  1000(%dx)        -> The number + content of the register constitutes the address
  10(%ax,%bx)    -> The number + reg1 + reg2 constitutes the address

To store a value, the "mov" instruction is also used, but the parameters are the opposite this time, for example:

mov %ax, 2000

In the above order, the "add" instruction is clear: it adds an immediate value (specified by $1) to the second parameter specified in the register (for example:% AX =% ax + 1).

Therefore, we can understand the above code sequence: it loads the value at address 2000, then adds 1 to it, and then stores it back at address 2000.

The fake instruction "halt" just stops the thread.

Run the simulator and see how it works, assuming that the above code sequence is in the file "simple race. S".

HW-ThreadsIntro$ ./x86.py -p simple-race.s -t 1

       Thread 0
1000 mov 2000(%bx), %ax
1001 add $1, %ax
1002 mov %ax, 2000(%bx)
1003 halt

Here, the parameter (- p) is used to specify a program, and (- t) specifies the number of threads and interrupt interval. The interrupt interval is the frequency at which the scheduler wakes up and runs to switch to different tasks. Because there is only one thread in this example, this interval is not important.

The output is easy to read: the simulator prints program counters (here from 1000 to 1003) and gets instructions to execute. Note that we assume that (unreal) all instructions are executed in memory only in a single byte; In x86, instructions are flexible in size and may be a few bytes.

We can use more detailed tracking to better understand how the state of the machine changes during execution.

HW-ThreadsIntro$ ./x86.py -p simple-race.s -t 1 -M 2000 -R ax,bx -c

 2000      ax    bx          Thread 0
    0       0     0
    0       0     0   1000 mov 2000(%bx), %ax
    0       1     0   1001 add $1, %ax
    1       1     0   1002 mov %ax, 2000(%bx)
    1       1     0   1003 halt

By using the - M flag, memory locations can be tracked (comma separated means that multiple can be tracked, such as 20003000); By using the - R flag, you can track the value in a specific register.

The value on the left shows the memory / register contents after the instruction on the right is executed. For example, after the "add" instruction, you can see that% ax has increased to the value 1; After the second "mov" instruction (at PC=1002), you can see that the memory content at address 2000 has now also increased.

Here are some instructions that must be understood. Here is a code fragment of a loop:

.main
.top
sub  $1,%dx
test $0,%dx     
jgte .top         
halt

Here we need to introduce some. The first is the "test" instruction. This instruction takes two parameters and compares them; Then set it to implicit "condition codes" (like a 1-bit register) so that subsequent instructions can operate on it.

In this example, the other instruction is "jump" (in this example, "jgte" means "jump" if it is greater than or equal to the first value). If the second value is greater than or equal to the first value, the instruction will jump.

Last point: in order for the code to really work, dx needs to be initialized to 1 or greater.

So we run the program like this:

HW-ThreadsIntro$ ./x86.py -p loop.s -t 1 -a dx=3 -r dx -C -c

   dx   >= >  <= <  != ==        Thread 0
    3   0  0  0  0  0  0
    2   0  0  0  0  0  0  1000 sub  $1,%dx
    2   1  1  0  0  1  0  1001 test $0,%dx
    2   1  1  0  0  1  0  1002 jgte .top
    1   1  1  0  0  1  0  1000 sub  $1,%dx
    1   1  1  0  0  1  0  1001 test $0,%dx
    1   1  1  0  0  1  0  1002 jgte .top
    0   1  1  0  0  1  0  1000 sub  $1,%dx
    0   1  0  1  0  0  1  1001 test $0,%dx
    0   1  0  1  0  0  1  1002 jgte .top
   -1   1  0  1  0  0  1  1000 sub  $1,%dx
   -1   0  0  1  1  1  0  1001 test $0,%dx
   -1   0  0  1  1  1  0  1002 jgte .top
   -1   0  0  1  1  1  0  1003 halt

"- R dx" flag tracks the value of% dx; "- The "C" flag tracks the value of the condition code set by the test instruction. Finally, the "- a dx=3" flag sets the% dx register to the starting value of 3.

Through tracing, you can intuitively see that the value of the instruction "sub" gradually decreases%dx, and finally end the cycle by judging the conditions.

Now we have a more interesting example, for example, a race condition with multithreading. Let's first look at the code:

.main
.top
# critical section
mov 2000, %ax       # get the value at the address
add $1, %ax         # increment it
mov %ax, 2000       # store it back

# see if we're still looping
sub  $1, %bx
test $0, %bx
jgt .top

halt

This code has a critical section that loads the variable value (at address 2000), adds 1 to the value, and then saves it back.

The following code reduces only one loop counter (in% bx) to test whether it is greater than or equal to zero. If so, jump back to the critical area at the top again.

HW-ThreadsIntro$ ./x86.py -p looping-race-nolock.s -t 2 -a bx=1 -M 2000 -c

 2000          Thread 0                Thread 1
    0
    0   1000 mov 2000, %ax
    0   1001 add $1, %ax
    1   1002 mov %ax, 2000
    1   1003 sub  $1, %bx
    1   1004 test $0, %bx
    1   1005 jgt .top
    1   1006 halt
    1   ----- Halt;Switch -----  ----- Halt;Switch -----
    1                            1000 mov 2000, %ax
    1                            1001 add $1, %ax
    2                            1002 mov %ax, 2000
    2                            1003 sub  $1, %bx
    2                            1004 test $0, %bx
    2                            1005 jgt .top
    2                            1006 halt

You can see that each thread runs once and updates the shared variable at address 2000 every time, so the final result is 2.  

Insert "halt" whenever one thread stops and another thread must run; Switch row. Last example: run the same program as above, but with less interrupt frequency.

HW-ThreadsIntro$ ./x86.py -p looping-race-nolock.s -t 2 -a bx=1 -M 2000 -i 2

 2000          Thread 0                Thread 1
    ?
    ?   1000 mov 2000, %ax
    ?   1001 add $1, %ax
    ?   ------ Interrupt ------  ------ Interrupt ------
    ?                            1000 mov 2000, %ax
    ?                            1001 add $1, %ax
    ?   ------ Interrupt ------  ------ Interrupt ------
    ?   1002 mov %ax, 2000
    ?   1003 sub  $1, %bx
    ?   ------ Interrupt ------  ------ Interrupt ------
    ?                            1002 mov %ax, 2000
    ?                            1003 sub  $1, %bx
    ?   ------ Interrupt ------  ------ Interrupt ------
    ?   1004 test $0, %bx
    ?   1005 jgt .top
    ?   ------ Interrupt ------  ------ Interrupt ------
    ?                            1004 test $0, %bx
    ?                            1005 jgt .top
    ?   ------ Interrupt ------  ------ Interrupt ------
    ?   1006 halt
    ?   ----- Halt;Switch -----  ----- Halt;Switch -----
    ?                            1006 halt

As you can see, each thread is interrupted every 2 instructions, as we specified through the "- i 2" flag. What is the value of memory [2000] throughout the run? What should it be?

Now let's provide more information about what this program can simulate. Complete register set:% ax,% bx,% cx,% dx, and PC. In this version, "stack" is not supported, and there are no call and return instructions.

The complete simulation instruction set is as follows:

mov immediate, register     # moves immediate value to register
mov memory, register        # loads from memory into register
mov register, register      # moves value from one register to other
mov register, memory        # stores register contents in memory
mov immediate, memory       # stores immediate value in memory

add immediate, register     # register  = register  + immediate
add register1, register2    # register2 = register2 + register1
sub immediate, register     # register  = register  - immediate
sub register1, register2    # register2 = register2 - register1

test immediate, register    # compare immediate and register (set condition codes)
test register, immediate    # same but register and immediate
test register, register     # same but register and register

jne                         # jump if test'd values are not equal
je                          #                       ... equal
jlt                         #     ... second is less than first
jlte                        #               ... less than or equal
jgt                         #            ... is greater than
jgte                        #               ... greater than or equal

xchg register, memory       # atomic exchange: 
                            #   put value of register into memory
                            #   return old contents of memory into reg
                            # do both things atomically

nop                         # no op

Notes: 
- 'immediate' is something of the form $number
- 'memory' is of the form 'number' or '(reg)' or 'number(reg)' or 
   'number(reg,reg)' (as described above)
- 'register' is one of %ax, %bx, %cx, %dx

Finally, the following is the complete set of options for the simulator, which can use the - h flag

HW-ThreadsIntro$ ./x86.py -h
Usage: x86.py [options]

Options:
  -h, --help            show this help message and exit
  -s SEED, --seed=SEED  the random seed
  -t NUMTHREADS, --threads=NUMTHREADS
                        number of threads
  -p PROGFILE, --program=PROGFILE
                        source program (in .s)
  -i INTFREQ, --interrupt=INTFREQ
                        interrupt frequency
  -r, --randints        if interrupts are random
  -a ARGV, --argv=ARGV  comma-separated per-thread args (e.g., ax=1,ax=2 sets
                        thread 0 ax reg to 1 and thread 1 ax reg to 2);
                        specify multiple regs per thread via colon-separated
                        list (e.g., ax=1:bx=2,cx=3 sets thread 0 ax and bx and
                        just cx for thread 1)
  -L LOADADDR, --loadaddr=LOADADDR
                        address where to load code
  -m MEMSIZE, --memsize=MEMSIZE
                        size of address space (KB)
  -M MEMTRACE, --memtrace=MEMTRACE
                        comma-separated list of addrs to trace (e.g.,
                        20000,20001)
  -R REGTRACE, --regtrace=REGTRACE
                        comma-separated list of regs to trace (e.g.,
                        ax,bx,cx,dx)
  -C, --cctrace         should we trace condition codes
  -S, --printstats      print some extra stats
  -v, --verbose         print some extra info
  -c, --compute         compute answers for me

Most are obvious. Use - r to open a random interrupter (intfreq specified from 1 to - i), which can make the job problem more interesting.

-L specifies the location where the code is loaded in the address space.

-m specifies the size of the address space in KB.

-S prints some additional Statistics - c is not really used (unlike most simulators in the book) using trace or condition code.

Topics: Python Operating System Concurrent Programming