[in depth understanding of computer system] Chapter 2

Posted by Kev0121 on Sun, 23 Jan 2022 18:53:32 +0100

This article was launched in CSDN and synchronized to the blog park

Deep understanding of computer systems Chapter 2

2.1 storage of information

  • Hexadecimal to binary, convert each bit of hexadecimal to a 4-bit binary
    That is, \ ([0123456789abcdef] {16} \) corresponds to \ ([0000-1111] _2 \)
  • Each computer has a word length \ ((word\ size) \), and for a machine with a word length of \ (w \) bits, the virtual address range is \ ([0,2^w-1] \)
    The program can access up to \ (2^w \) bytes
  • In the current large-scale migration from 32-bit word length machines to 64 bit word length machines, most 64 bit word length machines are backward compatible with 32-bit word length machines. When the program prog C use pseudo instruction Linux > GCC - M32 prog After C compilation, the program can run correctly on 32-bit or 64 bit machines. When the program prog,c uses the pseudo instruction Linux > GCC - M64 prog After C compilation, the program can only run on 64 bit machines. The difference between 32-bit and 64 bit programs is how the program is compiled, not the type of machine it depends on.
  • Storage mode of information, in a = 4666; For example, the size of a is 4 bytes, the binary representation is 0001 0010 0011 1010, and the hexadecimal representation is 0x123a.
    The effective bytes are a, 3, 2 and 1 from low to high. These bytes are stored continuously on the machine. If the least effective byte A is stored at a higher memory address, this is called the big end (high tail end). Conversely, the least significant byte A is stored at a lower memory address, which is called a small segment (low tail).
    As for the big end and small end machines, Linux 32 and windows
    Reference blog
  • How to print out the specific storage format of a number
/*
author: solego
*/
#include<bits/stdc++.h>
using namespace std;

typedef unsigned char* byte_pointer;
void show_bytes(byte_pointer start, int len) {
	for(int i = 0; i < len; ++i) {
		printf("%x\n", start[i]);
	} printf("\n");
	
int main()
{
	int x = -10;
	/*
		expect: 0x ff ff ff f6
			x(sign magnitude)   = 10000000 00000000 00000000 00001010
			x(one's complement) = 11111111 11111111 11111111 11110101
			x(two's complement) = 11111111 11111111 11111111 11110110
		   10(two's complement) = 00000000 00000000 00000000 00001010
	*/
	show_bytes((byte_pointer) &x, sizeof(x));
	return 0;
}

result:
f6
ff
ff
ff

It can be seen that the complement is stored in the computer, the cycle is from low address to high address, and the least significant byte is output first. Therefore, windows adopts the small end (low tail). Here, the unsigned char size is \ (1 \) bytes, which just represents [0255].

  • On bit operation, it is worth talking about logical shift and arithmetic shift.
data type Shift direction Shift mode
Unsigned number Shift left Logical shift, complement 0
Signed number Shift left Logical shift, complement 0
Unsigned number Shift right Arithmetic shift, complement 0
Signed number Shift right Arithmetic shift, negative numbers complement 1, and the rest complement 0
The left shift adopts logical shift, and the right shift of unsigned numbers adopts logical shift
Signed numbers are arithmetically shifted to the right, negative numbers are shifted to the right and supplemented by 1 to the left, and the rest are shifted to the right and supplemented by 0 to the left, which also meets the requirements of complement operation.

2.2 representation of integers

  • All integers in a computer exist in the form of complement
    Original code, inverse code and complement code: Recommended reference

    Take the complement of a number as a vector, each bit of the vector is its binary value, and a vector of \ (w \) bits has:

    For nonnegative and unsigned numbers, \ (\ sum {I = 0} ^ {W-1} x _i {2 ^ I} \),
    For negative numbers: the highest bit weight is \ (- 2^{w-1} \), that is \ (x^{w-1}\times(-2^{w-1})+\sum_{i=0}^{w-2}x_i{2^i}\)

    In this way, it can be well explained that \ (1000 \ 0000 \) means \ (- 2 ^ 7 \) rather than \ (- 0 \).

    Take the \ (8 \) bit as an example
    For signed numbers:

    • Positive number: the sign bit is \ (0 \), \ ([0000 \ 00000111 \ 1111] \) that is \ ([0,2 ^ 7-1] \)
    • Negative number: the sign bit is \ (1 \), \ ([1000 \ 00001111 \ 1111] \), i.e. \ ([- 2 ^ 7, - 1] \)
  • The conversion between signed and unsigned numbers with equal digits

    short int a = -12345;
    unsigned short b = (unsigned short) a;
    printf("a = %d, b = %u", a, b);
    
    result:
    a = -12345
    b = 53191
    

    In the above example, both short and unsigned short are 2 bytes, i.e. 16 bits

    -12345: 1100 1111 1100 0111
     53191: 1100 1111 1100 0111
    

    It can be found that the complements of the two are consistent, which can be understood as the way of interpreting these binary bits has changed
    For signed numbers, the highest bit is interpreted as symbolic bit, and the highest bit is \ (- 2 ^ {15} \), while for unsigned numbers, the highest bit is interpreted as ordinary bit, that is \ (2 ^ {15} \).

    On the mapping from signed number to unsigned number of \ (w \) bit
    If the sign bit of the signed number is \ (0 \), the interpretation remains unchanged
    If the sign bit of a signed number is \ (1 \), it is an ordinary bit when interpreted as an unsigned number.
    In terms of decimal system:

\[ SToU(x)=\left\{ \begin{aligned} x,x\geq 0\\ x+2^w,x<0\\ \end{aligned} \right. \]

This is because the sign bit of a negative number is $-2^{w-1}$Interpreted as $2^{w-1}$It's different $2^w$While converting a non negative number to an unsigned number, the sign bit has no effect.

Convert from unsigned to signed:

\[ UToS(x)=\left\{ \begin{aligned} x,x\leq 2^{w-1}-1\\ x-2^w,x\geq 2^{w-1}\\ \end{aligned} \right. \]

  • When a signed number and an unsigned number are operated on, the signed number is cast to an unsigned number to perform the operation
    int a = -1;
    unsigned int b = 0;
    if(a < b) printf("a < b\n");
    else printf("a > b\n");
    
    result:
    a > b
     there signed int a Was cast into(unsigned int a),Namely a = 2^32 - 1
    
  • Converts integer types with unequal digits
    • From less bytes to more bytes, the extended ones are high bits and the original ones are low bits
      The extension of unsigned numbers is zero extension, and 0 can be added to the extended digits
      The extension of signed number is sign bit extension, that is, the sign bit is supplemented on the extended digit

      For data casts, you cannot change the value it represents.

      int a = -1;
      long long b = a;
      a: 1111 1111 1111 1111 1111 1111 1111 1111
      b: 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
      

      It needs to be proved that the results of these two complements are the same.
      Let's consider a simple case:

      a 4-digit-1,b 5-digit-1,c 6-bit-1
      a: 1111 = -2^3 + 2^2 + 2^1 + 2^0
      b: 1 1111 = (-2^4 + 2^3) + 2^2 + 2^1 + 2^0 = -2^3 + 2^2 + 2^1 + 2^0
      c: 11 1111 = (-2^5 + 2^4 + 2^3) + 2^1 + 2^0 = -2^3 + 2^2 + 2^1 + 2^0
      
      therefore c = b = a,It is still right to expand to a higher level
       For positive numbers, the extended sign bit is 0, which obviously will not affect the result.
      
    • When converting from more bytes to less bytes, keep the low order and directly remove the high order, which may change the actual value
      For unsigned numbers, the high order can be removed directly
      For signed numbers, first interpret them with unsigned numbers, then directly remove the high bits, and then interpret them with signed numbers

      long long b = -1ll << 32;
      cout << b << "\n";
      
      int a = (int)b;
      cout << a << "\n";
      
      result:
      -4294967296
      0
    

2.3 addition (subtraction) method of integer operation

  • Unsigned number addition \ (0 \ Leq x < 2 ^ W, 0 \ Leq y < 2 ^ w \)

\[ x+y=\left\{ \begin{aligned} x+y,x+y< 2^{w}\\ x+y-2^w,x+y\geq 2^w\\ \end{aligned} \right. \]

Greater than or equal to \ (2^w \) is equivalent to overflow. All overflow parts are discarded, and only the \ ([0,w) \) bit is reserved, which is equivalent to taking the module of \ (2^w \). Because the range of \ (x+y \) here is \ ([0,2^{w+1}-2] \), it will only have the right in the \ (w \) bit at most, so subtract \ (2^w \).

Judge overflow

	bool add_ok(unsigned x, unsigned y) {
		unsigned sum = x + y;
		return sum >= x;
	}
	
	prove:
	Mathematically:
	x + y >= x, x + y >= y
	When overflow occurs:
	sum = x + y - 2^w
	because y < 2^w
	therefore y - 2^w < 0
	therefore x + y - 2^w < x
	
	about y The same is true.
  • Signed number addition is divided into positive overflow, that is, the result overflow of the addition of two positive numbers, and negative overflow, that is, the result overflow of the addition of two negative numbers
    \(-2^{w-1}\leq x\leq 2^{w-1}-1, -2^{w-1}\leq y\leq 2^{w-1}-1\)

\[ x+y=\left\{ \begin{aligned} x+y-2^w,x+y\geq 2^{w-1}\\ x+y,-2^{w-1}\leq x+y< 2^{w-1}\\ x+y+2^w,x+y< -2^{w-1}\\ \end{aligned} \right. \]

	For positive overflow:
	Two binary positive numbers 127 and 1:
	0111 1111
	0000 0001
	----------
	1000 0000
	
	For negative overflow:
	Two binary negative numbers -128 and-1
	   1000 0000
	   1111 1111
	----------
	(1)0111 1111
	
	Either way:
	We can use unsigned numbers to explain them, and then add unsigned numbers
	After the result is obtained, the unsigned number is converted to the signed number
	
	The only problem is: when explaining negative overflow, we get(w+1)Bit, interpreted as a signed number is a(w+1)Signed number of bits,
	We discard the overflow part, so we need to add this part, that is, 2^w
  • Additive inverse element
    For integer \ (x \), make \ (x '\) of \ (x+x'=0 \) be the additive inverse of \ (x \), which can also be called the opposite number

    • The additive inverse of unsigned numbers
      \(0\leq x< 2^w,0\leq x'<2^w\)
      The overflow inverse is involved here. When the \ (w+1 \) bit is \ (1 \), the rest are \ (0 \), especially when \ (x=0 \), \ (x'=0 \)
      Otherwise \ (x'=2^w-x \)

\[ x'=\left\{ \begin{aligned} x,x=0\\ 2^w-x,x>0\\ \end{aligned} \right. \]

  • Additive inverse of signed numbers
    When \ (x > - 2 ^ w \), the corresponding \ (x'=-x \)
    When \ (x=-2^w \), the corresponding \ (x'=x \)

\[ x'=\left\{ \begin{aligned} x,x=-2^w\\ -x,x>-2^w\\ \end{aligned} \right. \]

2.3 multiplication and division

  • multiplication
    First, do bit expansion. Because the number of two \ (w \) bits is multiplied, the result is \ (2w \) bits. Therefore, you need to expand the symbol bit first, expand both to \ (2w \) bits, then calculate, and finally intercept the low \ (w \) bits
    Such as 5[101] And 3[011]
    The first sign bit extension is: 5[000101]And 3[000011]
    Recalculation
    		000101
    		000011
    ---------------
    		000101
    	   000101
    	  000000
    	 000000
    	000000
       000000
    ---------------
     00000|001111
    
    Another example-3[101] And 3[011]
    First, the symbol bit is extended to obtain:-3[111101]And 3[000011]
    		111101
    		000011
    ---------------
    		111101
    	   111101
    	  000000
    	 000000
    	000000
       000000
    ---------------
     00000|110111
    

Topics: csapp