tcp stack implementation, tcp timer and sliding window

Posted by LDM2009 on Fri, 11 Feb 2022 12:07:35 +0100

To realize user state protocol stack, we must understand TCP, 11 states of TCP, sliding window, congestion control, timer and so on.

To use the user state protocol stack, the epoll provided by the kernel will not work. We need to implement the user state epoll ourselves. Epoll involves a callback opportunity. The callback function is to add the nodes in the red black tree to the ready queue. The details will be explained in the principle of epoll. If we understand the 11 states of TCP, we will know when to callback.

TCP state transition diagram

In the front [POSIX and network protocol stack]( https://github.com/congchp/Linux-server/blob/main/posix api and network protocol stack md), the state transition of tcp has been introduced. It can be seen in combination with the tcp state transition diagram.

Where is the TCP state saved? Stored in TCB, i.e. TCP PCB, protocol control block. It contains socket information, sendbuffer and recvbbuffer. TCB saves from listen to time_ All States of wait.

Implementation of user mode TCP protocol stack

Previously implemented UDP protocol stack , the implementation of TCP protocol stack is similar, but it is much more complex than UDP.

TCP header definition

What is the initial value of seq num? What happens when the maximum value (2 ^ 32 - 1) is reached? Will it cross the boundary?

The initial value of seq num is a random value, which is then accumulated. After reaching the maximum value, it will be calculated from 0 and will not cross the boundary.

Is seq num the number of packets or bytes?

When calculating, the number of bytes is used.

What does TCP packet mean? Why does TCP header have no packet length?

The two packets before and after TCP have serial numbers, and the length of the packet can be calculated.

ack num = seq num + packet length.

The header length is 4 bits, the maximum value is 15, and the unit is 4 bytes, so the maximum TCP header is 15 * 4 = 60 bytes. If there is no option, the TCP header is 20 bytes, and the header length value is 5.

window size, the maximum capacity that can receive data.

urgent pointer, if the URG position is 1, it tells the opposite end that the data from this position should be processed immediately.

struct tcphdr {

	unsigned short sport;
	unsigned short dport;

	unsigned int seqnum;
	unsigned int acknum;

	unsigned char hdrlen_resv;

	unsigned char flag; 

	unsigned short window;

	unsigned short checksum;
	unsigned short urgent_pointer;

	unsigned int options[0];
				  

};

Define TCP flag

#define TCP_CWR_FLAG		0x80
#define TCP_ECE_FLAG		0x40
#define TCP_URG_FLAG		0x20
#define TCP_ACK_FLAG		0x10
#define TCP_PSH_FLAG		0x08
#define TCP_RST_FLAG		0x04
#define TCP_SYN_FLAG		0x02
#define TCP_FIN_FLAG		0x01

The latter five flag s are more important

ACK is used for confirmation

PSH tells the opposite end to quickly notify the application to process the data packet, which can be set to PSH in the process of data transmission.

Rst: if the received ack num, seq num or width size is illegal, or the data is wrong, an RST will be returned to the opposite end. After the first handshake is sent for three times, an RST will also be sent if the second handshake from the opposite end is not received after the timeout.

SYN is only used at the beginning of the connection to tell the opposite end seq num, that is, the sequence number of the first packet sent.

FIN, termination.

Define TCP packets

struct tcppkt {

	struct ethhdr eh; // 14
	struct iphdr ip;  // 20 
	struct tcphdr tcp; // 8

	unsigned char data[0];

};

Define TCP status

typedef enum _tcp_status {

	TCP_STATUS_CLOSED,
	TCP_STATUS_LISTEN,
	TCP_STATUS_SYN_REVD,
	TCP_STATUS_SYN_SENT,
	TCP_STATUS_ESTABLISHED,
	TCP_STATUS_FIN_WAIT_1,
	TCP_STATUS_FIN_WAIT_2,
	TCP_STATUS_CLOSING,
	TCP_STATUS_TIME_WAIT,

	TCP_STATUS_CLOSE_WAIT,
	TCP_STATUS_LAST_ACK,

};

Define TCB

struct ntcb {

	unsigned int sip;
	unsigned int dip;
	unsigned short sport;
	unsigned short dport;

	unsigned char smac[ETH_ADDR_LENGTH];
	unsigned char dmac[ETH_ADDR_LENGTH];

	unsigned char status;
	
};

Implement TCP triple handshake

After the server handles the state transition of three handshakes, the client can establish a connection with the server.

int main() {

	struct nm_pkthdr h;
	struct nm_desc *nmr = nm_open("netmap:eth0", NULL, 0, NULL);
	if (nmr == NULL) return -1;

	struct pollfd pfd = {0};
	pfd.fd = nmr->fd;
	pfd.events = POLLIN;

	struct ntcb tcb;

	while (1) {

		int ret = poll(&pfd, 1, -1);
		if (ret < 0) continue;

		if (pfd.revents & POLLIN) {

			unsigned char *stream = nm_nextpkt(nmr, &h);

			struct ethhdr *eh = (struct ethhdr *)stream;
			if (ntohs(eh->h_proto) ==  PROTO_IP) {

				struct udppkt *tcp = (struct udppkt *)stream;

				if (tcp->ip.type == PROTO_TCP) {

					struct tcppkt *tcp = (struct tcppkt *)stream;
                    
					unsigned int sip = tcp->ip.sip;
					unsigned int dip = tcp->ip.dip;

					unsigned short sport = tcp->tcp.sport;
					unsigned short dport = tcp->tcp.dport;

					tcb = search_tcb();

					if (tcb->status == TCP_STATUS_LISTEN) { //
						
						if (tcp->tcp.flag & TCP_SYN_FLAG) {
                            
                            client_tcb = create_tcb();

							client_tcb->status = TCP_STATUS_SYN_REVD;

                            // Swap sip, sport and smac with dip, dport and dmac
							// send syn, ack pkt
							// seqnum, ack 


						} 
						
					} else if (tcb->status == TCP_STATUS_SYN_REVD) {

						if (tcp->tcp.flag & TCP_ACK_FLAG) {

							client_tcb->status = TCP_STATUS_ESTABLISHED;

						}
						
					}

				}
				

			}

		}

	}
	
	

}

Data transmission process

MSS (Maximum Segment Size) is an option defined by TCP protocol. MSS option is used to negotiate the maximum data length that each message segment can carry when TCP connection is established.

MTU is a limitation on the data link layer.

Client to server

  1. Send 1M file
  2. sendbuffer = 2k
  3. mss = 512
  4. mtu = 1500
while (1) {

    poll(fd)
    send(fd, buffer, 1k, 0);
    
}

If the client sendbuff = 2k, mss = 512. Send in 4 packets

Can the client send these four packages?

Not necessarily. It depends on the receiving window size of the server. If the window size is 1024. If the client sends two packets, each with a size of 512, if the application of the server does not fetch, the window size in the returned ack packet will be 0, and there will be 1k data left in the client's sendbuffer that cannot be sent. It will wait until the server data is processed before sending.

If you wait for ack every time you send a packet, this efficiency is too slow. We need to be able to send multiple packets at the same time, which is the process of slow start.

Slow start process

Send 1 * mss for the first time

Send 2 * mss for the second time

Send 4 * mss for the third time

Slow start process, send 1mss, 2mss, 4*mss

How to judge whether the packet exceeds the network load?

Timeout by judgment. What about overtime?

Congestion avoidance, sending data packets from the client to the server, more and more data on the network, resulting in network congestion, so that the server can not receive data correctly.

How to judge whether the packet exceeds the network load?

rtt, round trip time, round trip time of data packet.

When entering the weak network environment of elevator, rtt suddenly becomes larger, which is called jitter.

Current rtt calculation method

rtt = 0.1 * rtt(new) + 0.9 * rtt(old), which is a dithering process.

It is used to judge whether there is timeout this time. Once timeout occurs, it is used to judge whether the number of sent packets needs to be reduced by one, exceeding the network load.

If the window size of the server is 0, there is no space to receive, and the client can no longer send. If the server has space after processing the data, how can the client know that the server has space?

The server side window is 0. How can the client know that the server has receiving space when the server has finished processing the data and the window is not 0?

  1. The server actively tells the client-- Bad place, what if the notification packet is lost in the network?
  2. Client timing query - TCP does this. When the opposite window is 0 and the probe packet is sent regularly, it is the probe timer.

Regular query on the client is better.

sliding window

Sliding windows are also in mss units.

Slide the window.

In the middle of the receiving process, prepare the pointer. One pointer corresponds to the confirmation that has been sent, and the other pointer corresponds to the maximum position allowed to be received. The length between two pointers indicates window size.

ack indicates that the previous data has been received and can be processed by calling recv; If you don't send an ack, don't worry about it first, which means that the data is not well organized and you can't call recv for processing.

Relationship between window size and recvbuff?

window size and recvbuff are related, but they are two concepts.

It seems that window size = recvbuff / 2, and no specific instructions can be found for this.

timer

Retransmission timer, detection timer (persistence timer), keepalive, TIME_WAIT timer, delay ack timer

Retransmission timer: after sending a packet, the sender starts the retransmission timer, and the RTT times out for retransmission. If the ack packet is received within the specified time, the timer will be cancelled;

Detection timer. If the opposite window size is 0, start the detection timer;

TCP already has keepalive. Why does the application layer provide heartbeat packets?

TCP keepalive is also a heartbeat packet. When it times out, it actively reclaims the TCB, which is not perceived by the application layer. The application layer heartbeat package is more controllable.

TIME_ Wait timer, time_ The wait time is 2msl to prevent the last ack of four waves from being lost.

Two concepts.

It seems that window size = recvbuff / 2, and no specific instructions can be found for this.

timer

Retransmission timer, detection timer (persistence timer), keepalive, TIME_WAIT timer, delay ack timer

Retransmission timer: after sending a packet, the sender starts the retransmission timer, and the RTT times out for retransmission. If the ack packet is received within the specified time, the timer will be cancelled;

Detection timer. If the opposite window size is 0, start the detection timer;

TCP already has keepalive. Why does the application layer provide heartbeat packets?

TCP keepalive is also a heartbeat packet. When it times out, it actively reclaims the TCB, which is not perceived by the application layer. The application layer heartbeat package is more controllable.

TIME_ Wait timer, time_ The wait time is 2msl to prevent the last ack of four waves from being lost.

Delay the ack timer. After receiving the TCP packet, the receiver starts the 200ms timer. After receiving the data again, reset the timer and send the ack after timeout.

I recommend a free open course of zero sound college. Personally, I think the teacher speaks well. I share it with you: Linux, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, collaboration, DPDK and other technical contents, learn immediately

Topics: Linux Back-end server Network Protocol TCP/IP