Von Neumann computer CPU Simulator (dual core version) -- Beiyou 19 / 20 / 21 planning and guidance operation

Posted by Unknown User on Thu, 03 Feb 2022 00:28:10 +0100

Von Neumann computer CPU Simulator (dual core version)

1, Introduction to curriculum design requirements

In previously designed Single core version On this basis, a core is added, that is, the implementation of dual threads to carry out the specified ticket grabbing operation. The ticket grabbing function has been realized by the instructions in the given file. It only needs to expand the CPU core and realize multithreading support.

For multiple cores, the registers are independent, while 32KB memory should be shared.

*The instruction input is obtained from the files "dict1.dic" and "dict2.dic". It is not entered manually. It can only be read from the keyboard when encountering the input operation in the instruction set.
*See for input and output examples Von Neumann structural operation_ Extraction code BUPT Can also be moved My Github

2, Instruction set

	instructions	explain
Shut down instructions	00000000 00000000 0000000000000000	Stop program execution.
data transfer instructions	00000001 00010000 0000000000000000	Transfers an immediate to register 1.
	00000001 00010101 0000000000000000	Transfer the contents of the memory unit (2 bytes) pointed to by the address in register 5 to register 1.
	00000001 01010001 0000000000000000	Transfer the contents of register 1 to the memory unit (2 bytes) pointed to by the address in register 5.
Arithmetic operation instructions	00000010 00010000 0000000000000000	Add the number in register 1 to an immediate number and save the result to register 1.
	00000010 00010101 0000000000000000	Add the number in register 1 to the number stored in the memory unit (2 bytes) pointed to by the address in register 5, and save the result to register 1.
	00000011 00010000 0000000000000000	Subtract an immediate number from the number in register 1 and save the result to register 1.
	00000011 00010101 0000000000000000	Subtract the number stored in the memory unit (2 bytes) pointed to by the address in register 5 from the number in register 1, and save the result to register 1.
	00000100 00010000 0000000000000000	Multiply the number in register 1 by an immediate number and save the result to register 1.
	00000100 00010101 0000000000000000	Multiply the number in register 1 by the number stored in the memory unit (2 bytes) pointed to by the address in register 5, and save the result to register 1.
	00000101 00010000 0000000000000000	Divide the number in register 1 by (integer division in C language) an immediate number, and save the result to register 1.
	00000101 00010101 0000000000000000	Divide the number in register 1 by the number stored in the memory unit (2 bytes) pointed to by the address in register 5 (integer division in C language), and save the result to register 1.
Logical operation instructions	00000110 00010000 0000000000000000	The number in register 1 is logically combined with an immediate number, and the result is saved to register 1. (if the result is true, save 1, otherwise save 0)
	00000110 00010101 0000000000000000	Make a logical sum between the number in register 1 and the number stored in the memory unit (2 bytes) pointed to by the address in register 5, and save the result to register 1. (if the result is true, save 1, otherwise save 0)
	00000111 00010000 0000000000000000	The number in register 1 is logically or with an immediate number, and the result is saved to register 1. (if the result is true, save 1, otherwise save 0)
	00000111 00010101 0000000000000000	Make logical or between the number in register 1 and the number stored in the memory unit (2 bytes) pointed to by the address in register 5, and save the result to register 1. (if the result is true, save 1, otherwise save 0)
	00001000 00010000 0000000000000000	Make logical negation of the number in register 1 and save the result to register 1. (if the result is true, save 1, otherwise save 0)
	00001000 00000101 0000000000000000	Make logical negation of the number stored in the memory unit (2 bytes) pointed to by the address in register 5, and the result is still saved to the memory unit pointed to by the address in register 5. (if the result is true, save 1, otherwise save 0)
compare instructions	00001001 00010000 0000000000000000	Compare the number in register 1 with an immediate number. If the two numbers are equal, the flag register is modified to 0. If register 1 is large, the flag register is set to 1. If register 1 is small, the flag register is set to - 1.
	00001001 00010101 0000000000000000	Compare the number in register 1 with the number stored in the memory unit (2 bytes) pointed to by the address in register 5. If the two numbers are equal, the flag register is set to 0. If register 1 is large, the flag register is set to 1. If register 1 is small, the flag register is set to - 1.
Jump instructions	00001010 00000000 0000000000000000	The unconditional jump instruction is transferred to the program counter plus an immediate number for execution. That is, to modify the program counter.
	00001010 00000001 0000000000000000	If the value in the flag register is 0, it is transferred to the program counter plus an immediate number for execution. That is, to modify the program counter.
	00001010 00000010 0000000000000000	If the value in the flag register is 1, it is transferred to the program counter plus an immediate number for execution. That is, to modify the program counter.
	00001010 00000011 0000000000000000	If the value in the flag register is - 1, it is transferred to the program counter plus an immediate number for execution. That is, to modify the program counter.
Input and output instructions	00001011 00010000 0000000000000000	Read an integer from the input port and save it in register 1. That is, read an integer from the keyboard to register 1.
	00001100 00010000 0000000000000000	Outputs the number in register 1 to the output port. That is, the number in register 1 is output to the display in the form of an integer, and a newline character is output at the same time.
Multicore instruction	00001101 00000000 0000000000000000	The immediate is the memory address. It requests a mutually exclusive object to lock the memory specified by the immediate.
	00001110 00000000 0000000000000000	The immediate is the memory address, which releases the mutually exclusive object and releases the mutually exclusive object that locks the memory specified by the immediate. Corresponds to the previous instruction.
	00001111 00000000 0000000000000000	Hibernate immediately in milliseconds.

3, Thinking and analysis

Main problems

How to modify the previously designed data structure to adapt to multithreading
How to implement multithreading

Modify data structure

The course now allows the use of global variables. For the convenience of parameter transfer, you can still use more global variables.
On the premise of ensuring no interference between multiple cores, it is obvious that the register should be private to the core and the memory should be shared globally.

It is easy to think of a way to double the previously set registers, that is, a single variable becomes a one-dimensional array, and a one-dimensional array becomes a two-dimensional array. For example, ax[9] becomes ax[2][9]. The CPU id can distinguish which set of registers to use.

For memory, there is no need to modify. The course has specified the location of the two core code segments, and they will not overlap, otherwise segment errors will occur.

Multithreading implementation

For beginners, multithreading is a complex concept. In fact, it takes a lot of energy to realize thread synchronization for large projects.
Here, it is assumed that you have read the PPT in the attachment, or understood relevant concepts and have some experience, so you only focus on some important contents.

Run multithreading

Compared with the multithreading provided by C + +, the multithreading in C is more cumbersome, but in any case, a packaged function needs to be provided. The essence of multithreading is to run this function at the same time. At the same time, since the number of parameters is limited to one, only one CPU id parameter can be passed. If you want to pass multiple parameters, you can use the structure.

Thread control is generally carried out through handle, which is equivalent to variable name.

	HANDLE hThread1, hThread2;	//Declare two handles
	hThread1 = (HANDLE)_beginthreadex(NULL, 0, func1, 0, 0, NULL);
	//Create thread 1. The function called is func1 and the parameter is id=0
	hThread2 = (HANDLE)_beginthreadex(NULL, 0, func2, 1, 0, NULL);
	//Create thread 2. The called function is func2 and the parameter is id=1

The operation of multithreading is usually as follows,
Main thread starts first - > create multiple threads - > run multiple threads in parallel - > one thread ends first - > the other thread ends - > main thread ends
It can be seen that if you continue the main thread without waiting for all the multithreads to end, that is, the statements in the main function may cause the program to terminate before another thread ends. This result is not what we expect, so you must wait for both threads to end and then return to the main thread.

	WaitForSingleObject(hThread1, INFINITE);	//Wait until thread 1 ends
	CloseHandle(hThread1);						//Close thread 1 handle
	WaitForSingleObject(hThread2, INFINITE);	//Wait until thread 2 ends
	CloseHandle(hThread2);						//Close thread 2 handle
	mainFuction();								//Continue with other functions of the main function

Thread synchronization (preventing thread conflicts)

Imagine that there is a data stored in the globally shared memory, and multiple threads can read and write it at the same time. If the operations are atomic operations, that is, the called functions cannot be subdivided in the sense of assembly. If they are completed in one step, the operation order of threads is clear, of course, there will be no conflict.

However, if the operation is not atomic, imagine the following scenario:

Thread 1 reads the memory data (100) to the register and reduces it by 1. At this time, the updated number (99) will be written back to the original memory location from the register
Thread 2 also reads the data (100), and the value has not been changed in memory
Thread 1 writes the value (99) in the register back to memory
Thread 2 subtracts the value in the register by 1 and writes the updated number (99) back to memory

Obviously, the final result is that the number is modified twice by two threads, but the value is reduced by only 1. Assuming that this data represents the number of remaining tickets, it is unacceptable that two people will successfully grab the same ticket.
Therefore, we must ensure thread synchronization, that is, to avoid thread conflict.

It is easy to infer from the above analysis that the reason for thread conflict is that multiple threads will read and write the same global memory at the same time, and their operations are not atomic operations. Multiple instructions are generated in the process of assembly, and the execution speed of threads is different.

Thread lock can be used to solve this contradiction. The specific implementation of lock can not only call the library function directly, but also simulate it by yourself. This content will be discussed below.

Operation of lock

Method 1: transfer the warehouse

<windows. h> It provides us with mutex, which is a kind of lock. Its creation and use methods are as follows:

HANDLE hMutex;	//Declare the handle in the global area in advance
...
void func() {
	hMutex = CreateMutex(NULL, false, NULL);	//Create mutex
	WaitForSingleObject(hMutex, INFINITE);		//Wait for the mutex to be applied for and occupy it
	dosomething();			//Specific operation
	ReleaseMutex(hMutex);	//Release mutex
}

If the lock is set inside a function, when thread 1 calls this function and operates first, the ownership of the lock will belong to thread 1, and thread 2 will wait until the lock is available when it enters later.

Mode 2: simulation

If the lock takes a long time, the ownership may span multiple functions, or simply want to lock a part of the memory continuously, the simulation method can be adopted, that is, to establish an array type lock structure. When applying for ownership, set the read-write permission of other cores to 0, that is, not read-write. When releasing the ownership, reset each core permission.
For the core that wants to read and write a piece of memory, you must first check whether you have read and write permission. If not, continue to wait. If so, access normally.

bool memLock[2][32 * 1024];			//Global variables, simulated memory locks
...
void lock(int id, int pos) {	//Lock
	while(memLock[id][pos] == 0) Sleep(10);		//No read-write permission, wait continuously, or the memory will be deadlock
	id == 0 ? memLock[1][pos] = 0 : memLock[0][pos] = 0;	//Make another core permission unreadable
}
void unlock(int id, int pos) {	//Release lock
	id == 0 ? memLock[1][pos] = 1 : memLock[0][pos] = 1;	//Return the permission directly without waiting
}
void func(int id) {	//Normal read / write operation
	while(memLock[id][pos] == 0) Sleep(10);		//No read-write permission, waiting continuously
	dosomething();		//Obtain permission and carry out corresponding operations. When permission return is involved, unlock() should be called
}

4, Reference code

Encourage independent thinking and solving problems independently. Do not copy directly, otherwise the duplicate check cannot be passed

#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include <process.h>
#include <windows.h>

struct Byte {	//stored by bit
	unsigned char bit0 : 1, bit1 : 1, bit2 : 1, bit3 : 1,
		bit4 : 1, bit5 : 1, bit6 : 1, bit7 : 1;
};

union Memory {
	unsigned char data;
	struct Byte b;
};

union Register {
	short data;
	struct {
		struct Byte b0;
		struct Byte b1;
	}b;
};

const int START = 16384;			//dataSegment starts from 16384
const int CODESTART[2] = { 0,256 };	//beginning of codeSegment of core 1 and core 2 

union Memory memory[32 * 1024];		//32Kb Memory
bool memLock[2][32 * 1024];			//Memory lock
unsigned int PC[2];					//ProcessCounter
union Register IR[2];				//InstructionRegister, stores only 2 bytes
union Register ax[2][9];			//GeneralRegisters, store only 2 bytes. (don't use ax[][0]).
int FR[2] = { 0 };					//FlagsRegister

HANDLE hThread1, hThread2, IOMutex;

void ReadToMemory(FILE* fp, int begin);
unsigned __stdcall process(int id);
void move1(int dest, int src, int id);
void move2(int dest, int src, int id);
void cal(int dest, int src, int mode, int id);
void AND(int dest, int src, int id);
void OR(int dest, int src, int id);
void NOT(int dest, int mode, int id);
void cmp(int dest, int src, int id);
void show(int id);
void checkLock(int pos, int id);

int main() {
	
	FILE* fp1 = fopen("dict1.dic", "r"), * fp2 = fopen("dict2.dic", "r");
	ReadToMemory(fp1, CODESTART[0]);
	ReadToMemory(fp2, CODESTART[1]);
	fclose(fp1);
	fclose(fp2);
	memory[16385].data = 100;
	
	IOMutex = CreateMutex(NULL, FALSE, NULL);
	hThread1 = (HANDLE)_beginthreadex(NULL, 0, process, 0, 0, NULL);
	hThread2 = (HANDLE)_beginthreadex(NULL, 0, process, 1, 0, NULL);
	WaitForSingleObject(hThread1, INFINITE);
	CloseHandle(hThread1);
	WaitForSingleObject(hThread2, INFINITE);
	CloseHandle(hThread2);
	
	printf("\ncodeSegment :\n");
	for (int i = 0; i < 16; i++) {
		for (int j = 0; j < 8; j++) {
			if (j) putchar(' ');
			printf("%d", (memory[i * 32 + j * 4].data << 24) | (memory[i * 32 + j * 4 + 1].data << 16) |
				(memory[i * 32 + j * 4 + 2].data << 8) | memory[i * 32 + j * 4 + 3].data);
		}
		putchar('\n');
	}
	printf("\ndataSegment :\n");
	for (int i = 0; i < 16; i++) {
		for (int j = 0; j < 16; j++) {
			if (j) putchar(' ');
			printf("%hd", (memory[START + i * 32 + j * 2].data << 8) | memory[START + i * 32 + j * 2 + 1].data);
		}
		putchar('\n');
	}

	return 0;
}


void move1(int dest, int src, int id) {		//immed/memory->ax
	checkLock(src, id);
	short data = (memory[src].data << 8) | memory[src + 1].data;
	ax[id][dest].data = data;
}
void move2(int dest, int src, int id) {		//ax->memory
	checkLock(dest, id);
	memory[dest].data = ax[id][src].data >> 8;
	memory[dest + 1].data = ax[id][src].data & 255;
}
void cal(int dest, int src, int mode, int id) {		//calculate, mode: 0->add, 1->sub, 2->mul, 3->div
	checkLock(src, id);
	short data = (memory[src].data << 8) | memory[src + 1].data;
	if (mode == 0) ax[id][dest].data += data;
	else if (mode == 1) ax[id][dest].data -= data;
	else if (mode == 2) ax[id][dest].data *= data;
	else if (mode == 3) ax[id][dest].data /= data;
}
void AND(int dest, int src, int id) {	//logic and
	checkLock(src, id);
	short data = (memory[src].data << 8) | memory[src + 1].data;
	(ax[id][dest].data & data) ? (ax[id][dest].data = 1) : (ax[id][dest].data = 0);
}
void OR(int dest, int src, int id) {	//logic or
	checkLock(src, id);
	short data = (memory[src].data << 8) | memory[src + 1].data;
	(ax[id][dest].data | data) ? (ax[id][dest].data = 1) : (ax[id][dest].data = 0);
}
void NOT(int dest, int mode, int id) {		//logic not
	checkLock(dest, id);
	if (mode == 0) ax[id][dest].data = !ax[id][dest].data;
	else {
		if (memory[dest].data || memory[dest + 1].data) memory[dest].data = memory[dest + 1].data = 0;
		else memory[dest + 1].data = 1;
	}
}
void cmp(int dest, int src, int id) {		//compare
	checkLock(src, id);
	short data = (memory[src].data << 8) | memory[src + 1].data;
	if (ax[id][dest].data == data) FR[id] = 0;
	else if (ax[id][dest].data > data) FR[id] = 1;
	else FR[id] = -1;
}

void show(int id) {		//show you all the information
	WaitForSingleObject(IOMutex, INFINITE);
	printf("id = %d\nip = %hd\nflag = %d\nir = %hd\n", id + 1, PC[id], FR[id], IR[id].data);
	printf("ax1 = %hd ax2 = %hd ax3 = %hd ax4 = %hd\n", ax[id][1].data, ax[id][2].data, ax[id][3].data, ax[id][4].data);
	printf("ax5 = %hd ax6 = %hd ax7 = %hd ax8 = %hd\n", ax[id][5].data, ax[id][6].data, ax[id][7].data, ax[id][8].data);
	ReleaseMutex(IOMutex);
}

void ReadToMemory(FILE* fp, int begin) {	//read data and store in memory
	unsigned int cnt = begin;
	while (1) {
		char line[33] = { 0 };
		if (fscanf(fp, "%s", line) != EOF) {
			if (line[0] < '0' || line[0] > '1') break;	//unavailable data
			for (int i = 0; i < 4; i++) {
				unsigned char _data = 0;
				for (int j = 0; j < 8; j++) _data = _data * 2 + line[i * 8 + j] - '0';
				memory[cnt++].data = _data;
			}
		}
		else break;
	}
}

unsigned __stdcall process(int id) {	//main function, to judge and operate
	while (1) {
		IR[id].data = (memory[PC[id]].data << 8) | memory[PC[id] + 1].data;
		int cmd = IR[id].data >> 8;
		int from = IR[id].data & 15, to = (IR[id].data & 240) >> 4;
		bool noJump = 1;
		if (cmd == 0) {		//shut down
			PC[id] += 4;
			show(id);
			break;
		}
		if (cmd == 1) {		//move
			if (from == 0) move1(to, PC[id] + 2 + CODESTART[id], id);	//immed->ax[id]
			else if (from >= 5) move1(to, ax[id][from].data, id);	//memory->ax[id]
			else move2(ax[id][to].data, from, id);	//ax[id]->memory
		}
		else if (cmd == 2) {	//add
			from == 0 ? cal(to, PC[id] + 2 + CODESTART[id], 0, id) : cal(to, ax[id][from].data, 0, id);
		}
		else if (cmd == 3) {	//subtract
			from == 0 ? cal(to, PC[id] + 2 + CODESTART[id], 1, id) : cal(to, ax[id][from].data, 1, id);
		}
		else if (cmd == 4) {	//multiply
			from == 0 ? cal(to, PC[id] + 2 + CODESTART[id], 2, id) : cal(to, ax[id][from].data, 2, id);
		}
		else if (cmd == 5) {	//divide
			from == 0 ? cal(to, PC[id] + 2 + CODESTART[id], 3, id) : cal(to, ax[id][from].data, 3, id);
		}
		else if (cmd == 6) {	//logic and
			from == 0 ? AND(to, PC[id] + 2 + CODESTART[id], id) : AND(to, ax[id][from].data, id);
		}
		else if (cmd == 7) {	//logic or
			from == 0 ? OR(to, PC[id] + 2 + CODESTART[id], id) : OR(to, ax[id][from].data, id);
		}
		else if (cmd == 8) {	//logic not
			from == 0 ? NOT(to, 0, id) : NOT(ax[id][from].data, 1, id);
		}
		else if (cmd == 9) {	//compare
			from == 0 ? cmp(to, PC[id] + 2 + CODESTART[id], id) : cmp(to, ax[id][from].data, id);
		}
		else if (cmd == 10) {	//jump
			short data = (memory[PC[id] + 2 + CODESTART[id]].data << 8) | memory[PC[id] + 3 + CODESTART[id]].data;
			if (from == 0 || from == 1 && FR[id] == 0 || from == 2 && FR[id] == 1 || from == 3 && FR[id] == -1) {
				PC[id] += data;
				noJump = 0;
				show(id);
			}
		}
		else if (cmd == 11) {	//input
			WaitForSingleObject(IOMutex, INFINITE);
			printf("in:\n");
			ReleaseMutex(IOMutex);
			scanf("%hd", &ax[id][to].data);
		}
		else if (cmd == 12) {	//output
			WaitForSingleObject(IOMutex, INFINITE);
			printf("id = %d    out: %hd\n", id + 1, ax[id][to].data);
			ReleaseMutex(IOMutex);
		}
		else if (cmd == 13) {	//lock
			short pos = (memory[PC[id] + 2 + CODESTART[id]].data << 8) + memory[PC[id] + 3 + CODESTART[id]].data;
			checkLock(pos, id);
			id == 0 ? (memLock[1][pos] = 1) : (memLock[0][pos] = 1);
		}
		else if (cmd == 14) {	//release
			short pos = (memory[PC[id] + 2 + CODESTART[id]].data << 8) + memory[PC[id] + 3 + CODESTART[id]].data;
			id == 0 ? (memLock[1][pos] = 0) : (memLock[0][pos] = 0);
		}
		else if (cmd == 15) {	//sleep
			short sleepTime = (memory[PC[id] + 2 + CODESTART[id]].data << 8) + memory[PC[id] + 3 + CODESTART[id]].data;
			Sleep(sleepTime);
		}

		if (noJump) {
			PC[id] += 4;
			show(id);
		}
	}
	_endthreadex(0);
	return 0;
}

void checkLock(int pos, int id) {
	while (memLock[id][pos]) Sleep(10);
}

5, Utility

Instruction generation and execution tool of Beiyou level 19 large operation CPU Simulator

Note 1: thanks to the tool made by the senior student of grade 19, which provides great help for debug ging. When generating code, please pay attention to generating according to the provisions of the instruction set. For example, whether the operation object is high or low will usually have an impact. At the same time, the test samples given in it have some situations that are not described in the instruction set. The code I write only follows the requirements of the instruction set. Although it can pass OJ, it does not adapt to these situations.

Note 2: the execution operation in the tool is invalid because the link seems to be hanging.

6, Extended thinking

About hibernation in multithreading

Thread dormancy is of great significance for multithreaded programming. Because it involves the principle of operating system, I will only talk about a few points here.

The instruction specified in the instruction set is sleep immediate milliseconds, which is specifically reflected in the example as 2ms. The implementation is very simple. Just call the Sleep() function, and its unit is Ms.

But why Hibernate? It can be tested that if the sleep instruction in the sample is removed, thread 1 will completely seize all tickets, and thread 2 can't grab a ticket, because the instruction speed is too fast. However, if you let a thread sleep for 2ms, does the thread really sleep for 2ms exactly? What is the control principle behind it?

Further tests show that when thread 1 sells 50 tickets, the total sleep time should be 100ms in theory, but in fact, it is several times more than 100ms. By changing the sleep time to a constant within 2 ~ 20, it can be found that the sleep time is almost the same, which should be related to the scheduling of the CPU, because the operating system may not accept sleep for too short a time. In other words, even if it only sleeps for 2ms, the thread will still compete with other threads on the computer for CPU after waking up, and may not get CPU resource allocation immediately.

Therefore, this 2ms sleep is not exactly 2ms sleep to a large extent, but more active to give up CPU resources to other threads, so that other threads in this program can also grab tickets.

Further thinking, in the process of ticket grabbing, thread 1 and thread 2 start successively, and the instructions they execute are the same. Theoretically, they should take turns to obtain the memory lock, grab the ticket, release the memory lock and sleep. Then the ticket grabbing results should also be alternating, that is, after thread 1 grabs a ticket, thread 2 will grab the next one. But is that really the case? It can be found that sometimes the same thread can grab several tickets in succession.

This is not only related to the fact that the operation during ticket grabbing is not really atomic operation, but also related to sleep. During hibernation, other threads may complete multiple ticket grabbing operations. Why? Because the sleep time is not exactly 2ms! Suppose that after thread 1 grabs the ticket, it takes 18ms from sleep to continue running, while thread 2 sleeps twice, but the total time is only 16ms. Can you grab two tickets?

Construction in C + +

As mentioned in the single core version, object-oriented programming provided by C + + allows us to build two CPU objects. They have their own data members, do not interfere with each other, and can share and use global memory. At the same time, < thread > provided by the standard library of C++11 can easily realize multithreading, and < mutex > creates a convenient way to build mutex. These are excellent practices, and it is a great pity that the curriculum restrictions cannot be used.

Programmer Think