[in depth understanding of computer system] Chapter 2

Posted by Kev0121 on Sun, 23 Jan 2022 18:53:32 +0100

This article was launched in CSDN and synchronized to the blog park

Deep understanding of computer systems Chapter 2

2.1 storage of information

Hexadecimal to binary, convert each bit of hexadecimal to a 4-bit binary
That is, \ ([0123456789abcdef] {16} \) corresponds to \ ([0000-1111] _2 \)
Each computer has a word length \ ((word\ size) \), and for a machine with a word length of \ (w \) bits, the virtual address range is \ ([0,2^w-1] \)
The program can access up to \ (2^w \) bytes
In the current large-scale migration from 32-bit word length machines to 64 bit word length machines, most 64 bit word length machines are backward compatible with 32-bit word length machines. When the program prog C use pseudo instruction Linux > GCC - M32 prog After C compilation, the program can run correctly on 32-bit or 64 bit machines. When the program prog,c uses the pseudo instruction Linux > GCC - M64 prog After C compilation, the program can only run on 64 bit machines. The difference between 32-bit and 64 bit programs is how the program is compiled, not the type of machine it depends on.
Storage mode of information, in a = 4666; For example, the size of a is 4 bytes, the binary representation is 0001 0010 0011 1010, and the hexadecimal representation is 0x123a.
The effective bytes are a, 3, 2 and 1 from low to high. These bytes are stored continuously on the machine. If the least effective byte A is stored at a higher memory address, this is called the big end (high tail end). Conversely, the least significant byte A is stored at a lower memory address, which is called a small segment (low tail).
As for the big end and small end machines, Linux 32 and windows
Reference blog
How to print out the specific storage format of a number

/*
author: solego
*/
#include<bits/stdc++.h>
using namespace std;

typedef unsigned char* byte_pointer;
void show_bytes(byte_pointer start, int len) {
	for(int i = 0; i < len; ++i) {
		printf("%x\n", start[i]);
	} printf("\n");
	
int main()
{
	int x = -10;
	/*
		expect: 0x ff ff ff f6
			x(sign magnitude)   = 10000000 00000000 00000000 00001010
			x(one's complement) = 11111111 11111111 11111111 11110101
			x(two's complement) = 11111111 11111111 11111111 11110110
		   10(two's complement) = 00000000 00000000 00000000 00001010
	*/
	show_bytes((byte_pointer) &x, sizeof(x));
	return 0;
}

result:
f6
ff
ff
ff

It can be seen that the complement is stored in the computer, the cycle is from low address to high address, and the least significant byte is output first. Therefore, windows adopts the small end (low tail). Here, the unsigned char size is \ (1 \) bytes, which just represents [0255].

On bit operation, it is worth talking about logical shift and arithmetic shift.

data type	Shift direction	Shift mode
Unsigned number	Shift left	Logical shift, complement 0
Signed number	Shift left	Logical shift, complement 0
Unsigned number	Shift right	Arithmetic shift, complement 0
Signed number	Shift right	Arithmetic shift, negative numbers complement 1, and the rest complement 0
The left shift adopts logical shift, and the right shift of unsigned numbers adopts logical shift
Signed numbers are arithmetically shifted to the right, negative numbers are shifted to the right and supplemented by 1 to the left, and the rest are shifted to the right and supplemented by 0 to the left, which also meets the requirements of complement operation.

2.2 representation of integers

All integers in a computer exist in the form of complement
Original code, inverse code and complement code: Recommended reference

Take the complement of a number as a vector, each bit of the vector is its binary value, and a vector of \ (w \) bits has:

For nonnegative and unsigned numbers, \ (\ sum {I = 0} ^ {W-1} x _i {2 ^ I} \),
For negative numbers: the highest bit weight is \ (- 2^{w-1} \), that is \ (x^{w-1}\times(-2^{w-1})+\sum_{i=0}^{w-2}x_i{2^i}\)

In this way, it can be well explained that \ (1000 \ 0000 \) means \ (- 2 ^ 7 \) rather than \ (- 0 \).

Take the \ (8 \) bit as an example
For signed numbers:
- Positive number: the sign bit is \ (0 \), \ ([0000 \ 00000111 \ 1111] \) that is \ ([0,2 ^ 7-1] \)
- Negative number: the sign bit is \ (1 \), \ ([1000 \ 00001111 \ 1111] \), i.e. \ ([- 2 ^ 7, - 1] \)
The conversion between signed and unsigned numbers with equal digits
```
short int a = -12345;
unsigned short b = (unsigned short) a;
printf("a = %d, b = %u", a, b);

result:
a = -12345
b = 53191
```
In the above example, both short and unsigned short are 2 bytes, i.e. 16 bits
```
-12345: 1100 1111 1100 0111
 53191: 1100 1111 1100 0111
```
It can be found that the complements of the two are consistent, which can be understood as the way of interpreting these binary bits has changed
For signed numbers, the highest bit is interpreted as symbolic bit, and the highest bit is \ (- 2 ^ {15} \), while for unsigned numbers, the highest bit is interpreted as ordinary bit, that is \ (2 ^ {15} \).

On the mapping from signed number to unsigned number of \ (w \) bit
If the sign bit of the signed number is \ (0 \), the interpretation remains unchanged
If the sign bit of a signed number is \ (1 \), it is an ordinary bit when interpreted as an unsigned number.
In terms of decimal system:

\[ SToU(x)=\left\{ \begin{aligned} x,x\geq 0\\ x+2^w,x<0\\ \end{aligned} \right. \]

This is because the sign bit of a negative number is $-2^{w-1}$Interpreted as $2^{w-1}$It's different $2^w$While converting a non negative number to an unsigned number, the sign bit has no effect.

Convert from unsigned to signed:

\[ UToS(x)=\left\{ \begin{aligned} x,x\leq 2^{w-1}-1\\ x-2^w,x\geq 2^{w-1}\\ \end{aligned} \right. \]

When a signed number and an unsigned number are operated on, the signed number is cast to an unsigned number to perform the operation

int a = -1;
unsigned int b = 0;
if(a < b) printf("a < b\n");
else printf("a > b\n");

result:
a > b
 there signed int a Was cast into(unsigned int a)，Namely a = 2^32 - 1

Converts integer types with unequal digits
- From less bytes to more bytes, the extended ones are high bits and the original ones are low bits
  The extension of unsigned numbers is zero extension, and 0 can be added to the extended digits
  The extension of signed number is sign bit extension, that is, the sign bit is supplemented on the extended digit
  
  For data casts, you cannot change the value it represents.
```
int a = -1;
long long b = a;
a: 1111 1111 1111 1111 1111 1111 1111 1111
b: 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
```
  It needs to be proved that the results of these two complements are the same.
  Let's consider a simple case:
```
a 4-digit-1，b 5-digit-1，c 6-bit-1
a: 1111 = -2^3 + 2^2 + 2^1 + 2^0
b: 1 1111 = (-2^4 + 2^3) + 2^2 + 2^1 + 2^0 = -2^3 + 2^2 + 2^1 + 2^0
c: 11 1111 = (-2^5 + 2^4 + 2^3) + 2^1 + 2^0 = -2^3 + 2^2 + 2^1 + 2^0

therefore c = b = a，It is still right to expand to a higher level
 For positive numbers, the extended sign bit is 0, which obviously will not affect the result.
```
- When converting from more bytes to less bytes, keep the low order and directly remove the high order, which may change the actual value
  For unsigned numbers, the high order can be removed directly
  For signed numbers, first interpret them with unsigned numbers, then directly remove the high bits, and then interpret them with signed numbers
```
  long long b = -1ll << 32;
  cout << b << "\n";
  
  int a = (int)b;
  cout << a << "\n";
  
  result:
  -4294967296
  0
```

2.3 addition (subtraction) method of integer operation

Unsigned number addition \ (0 \ Leq x < 2 ^ W, 0 \ Leq y < 2 ^ w \)

\[ x+y=\left\{ \begin{aligned} x+y,x+y< 2^{w}\\ x+y-2^w,x+y\geq 2^w\\ \end{aligned} \right. \]

Greater than or equal to \ (2^w \) is equivalent to overflow. All overflow parts are discarded, and only the \ ([0,w) \) bit is reserved, which is equivalent to taking the module of \ (2^w \). Because the range of \ (x+y \) here is \ ([0,2^{w+1}-2] \), it will only have the right in the \ (w \) bit at most, so subtract \ (2^w \).

Judge overflow

	bool add_ok(unsigned x, unsigned y) {
		unsigned sum = x + y;
		return sum >= x;
	}
	
	prove:
	Mathematically:
	x + y >= x, x + y >= y
	When overflow occurs:
	sum = x + y - 2^w
	because y < 2^w
	therefore y - 2^w < 0
	therefore x + y - 2^w < x
	
	about y The same is true.

Signed number addition is divided into positive overflow, that is, the result overflow of the addition of two positive numbers, and negative overflow, that is, the result overflow of the addition of two negative numbers
\(-2^{w-1}\leq x\leq 2^{w-1}-1, -2^{w-1}\leq y\leq 2^{w-1}-1\)

\[ x+y=\left\{ \begin{aligned} x+y-2^w,x+y\geq 2^{w-1}\\ x+y,-2^{w-1}\leq x+y< 2^{w-1}\\ x+y+2^w,x+y< -2^{w-1}\\ \end{aligned} \right. \]

	For positive overflow:
	Two binary positive numbers 127 and 1:
	0111 1111
	0000 0001
	----------
	1000 0000
	
	For negative overflow:
	Two binary negative numbers -128 and-1
	   1000 0000
	   1111 1111
	----------
	(1)0111 1111
	
	Either way:
	We can use unsigned numbers to explain them, and then add unsigned numbers
	After the result is obtained, the unsigned number is converted to the signed number
	
	The only problem is: when explaining negative overflow, we get(w+1)Bit, interpreted as a signed number is a(w+1)Signed number of bits,
	We discard the overflow part, so we need to add this part, that is, 2^w

Additive inverse element
For integer \ (x \), make \ (x '\) of \ (x+x'=0 \) be the additive inverse of \ (x \), which can also be called the opposite number
- The additive inverse of unsigned numbers
  \(0\leq x< 2^w,0\leq x'<2^w\)
  The overflow inverse is involved here. When the \ (w+1 \) bit is \ (1 \), the rest are \ (0 \), especially when \ (x=0 \), \ (x'=0 \)
  Otherwise \ (x'=2^w-x \)

\[ x'=\left\{ \begin{aligned} x,x=0\\ 2^w-x,x>0\\ \end{aligned} \right. \]

Additive inverse of signed numbers
When \ (x > - 2 ^ w \), the corresponding \ (x'=-x \)
When \ (x=-2^w \), the corresponding \ (x'=x \)

\[ x'=\left\{ \begin{aligned} x,x=-2^w\\ -x,x>-2^w\\ \end{aligned} \right. \]

2.3 multiplication and division

multiplication
First, do bit expansion. Because the number of two \ (w \) bits is multiplied, the result is \ (2w \) bits. Therefore, you need to expand the symbol bit first, expand both to \ (2w \) bits, then calculate, and finally intercept the low \ (w \) bits

Such as 5[101] And 3[011]
The first sign bit extension is: 5[000101]And 3[000011]
Recalculation
		000101
		000011
---------------
		000101
	   000101
	  000000
	 000000
	000000
   000000
---------------
 00000|001111

Another example-3[101] And 3[011]
First, the symbol bit is extended to obtain:-3[111101]And 3[000011]
		111101
		000011
---------------
		111101
	   111101
	  000000
	 000000
	000000
   000000
---------------
 00000|110111

Topics: csapp

Programmer Think