Chapter 11 of csapp: network io md

Posted by rotto on Fri, 25 Feb 2022 13:39:23 +0100

1. Network architecture

You need to understand the basic client server programming model and how to write client server programs that use services provided by the Internet. Finally, combine all the concepts to develop a small but full-featured web server. Nowadays, network applications can be seen everywhere. Interestingly, all network applications are based on the same basic programming model, have similar overall logical structure and rely on the same programming interface. Most network applications are based on the client server model, one server process and one or more client processes (it is important that the client and server are processes rather than machines or hosts), the server manages resources, the server provides services for the client, and the client requests the server. Generally speaking, there are four steps: request, operation, response and processing.

The client process sends a request to the server process. The server process obtains the required resources and responds to the request of the client process. After receiving the response, the client process displays it to the user. For the host, the network is just an I/O device, which is the data source and data receiver. The network related processing is completed through the network adapter. The specific hardware is (lower right corner):

2. Network

According to the application scope and architecture level of the network, it can be divided into three parts:

SAN - System Area Network
LAN - Local Area Network
WAN - Wide Area Network

2.1 bottom layer - Ethernet Segment

An Ethernet Segment is a small box containing cables and hubs. Several hosts are connected through a switch (hub), usually in a room or on the first floor, as shown in the figure:

The Ethernet section consists of a group of hosts, which are connected to a hub through network cable (twisted pair)
Each Ethernet adapter has a unique 48 bit MAC address, and the host sends bits to other hosts in the form of frames
Hub will copy the data sent from each port to other ports, and all hosts can see all the data (pay attention to security issues)

2.2 next floor

2.2.1 Bridged Ethernet Segment

The bridging ethernet segment usually covers one floor and connects different Ethernet segments through different bridges. Bridge knows the hosts that can be reached from a certain port and selectively copies data between ports.

For simplicity, it can be considered that all hubs, bridges and wires can be abstracted as a collection of hosts connected in one line, as shown in the following figure:

2.2.2 internets

Multiple incompatible LAN s can be physically connected through router. The connected network is called internet (note that it is in lowercase)

The logical structure of internet is:

Ad hoc interconnection of networks
There is no specific topology
Different router and link may vary greatly
Transfer packet s by jumping between different networks
A Router is a connection between different networks
Different packet s may take different routes

2.3 network protocol

When transmitting data in different LAN s and WAN s, the rule to follow is protocol. Protocol is a set of rules that manage how hosts and routers cooperate when transmitting data between networks, eliminating the differences between different networks. The responsible things to do are:

Provide naming mechanism: define the unified format of host address, and each host and router has at least one independent internet address ID
Provide transmission mechanism: define the standard transmission unit Packet, which includes header and payload, and header includes packet size, source and destination address. Payload includes the data to be transmitted

The specific data transmission is shown in the figure below, where pH = Internet packet header and FH = LAN frame header

Eight basic steps for host A to send data to B:

The client on A makes A system call and copies data from the client virtual address space to the kernel buffer
The protocol software on A creates A LAN1 frame by attaching A packet header and A LAN1 frame header before the data. The packet header is addressed to B, and the LAN1 frame header is addressed to the router, and then the frame is transmitted to the adapter. Note that LAN1 frame payload is an interconnection network packet, and the packet payload is actual user data. This encapsulation is the basic network interconnection method
The LAN1 adapter copies the frame to the network
When this frame reaches the router, the LAN1 adapter of the router reads it from the cable and transmits it to the protocol software
The router takes the destination address from the packet header and uses it as the index of the routing table to determine where to forward the packet. This example is LAN2. The router peels off the old LAN1 frame header and adds the new LAN2 frame header addressed to B, and transmits the obtained frame to the adapter
The LAN2 adapter of the router copies the frame to the network
When this frame reaches B, its adapter reads this frame from the cable and transmits it to the protocol software
The protocol software on B peels off the packet header and frame header. When the server makes a system call to read these data, the protocol software finally copies the obtained data to the virtual address space of the server

2.4 TCP/IP protocol

The Internet is the most famous example of the Internet. Mainly based on TCP/IP protocol family:

IP (Internet Protocal): provides the basic naming scheme of host and the unreliable delivery ability of host to host data packets
UDP (Unreliable Datagram Protocol): use IP to provide unreliable datagrams between processes
TCP (Transmission Control Protocol): use IP to provide reliable byte flow between processes

Access by mixing Unix I/O and socket interface functions

The host has a 32-bit IP address - 23.235.46.133
An IP address maps to an identifier called an Internet domain name
Processes between different hosts can exchange data through connections

2.4.1 IP address

It is stored with IP address struct, and the IP address is stored in the big end

// Internet address structure
struct in_addr {
    uint32_t s_addr;    // network byte order (big-endian)
}

In order to facilitate reading, the IP address is generally expressed in the following form: 0x8002C2F2 = 128.2.194.242. The specific conversion can use getaddrinfo and getnameinfo functions

2.4.2 Internet host domain name

Mainly understand the concept of Domain Naming System(DNS), which is used to map IP address to domain name. Programmers can think of a DNS database as a collection of millions of host entries. Each host has a locally defined domain name localhost, which is always mapped to the loopback address

$ nslookup www.twitter.com
Server:     8.8.8.8
Address:    8.8.8.8#53
Non-authoritative answer:
www.twitter.com canonical name = twitter.com.
Name:   twitter.com
Address: 199.16.156.6
Name:   twitter.com
Address: 199.16.156.198
Name:   twitter.com
Address: 199.16.156.230
Name:   twitter.com
Address: 199.16.156.70

Use hostname to determine the real domain name of the local host. One or more domain names can be mapped to the same IP address, and multiple domain names correspond to multiple IPS

2.4.3 Internet connection

The client and server send byte stream through connection, which is characterized by:

Point to point: connect a pair of processes
Full duplex: data can flow in both directions at the same time
Reliable: the sequence of bytes sent is consistent with that received

Socket can be regarded as the endpoint of connection, and the socket address is an IPaddress:port pair.

Port is a 16 bit integer used to identify different processes and use different ports to connect different services:

Well-known port: Associated with some

service

provided by a server
```
/etc/services
```
(see specific information in)
echo server: 7/echo
ssh server: 22/ssh
email server: 25/smtp
web servers: 80/http

connect

2.5 Socket interface

A series of system level functions cooperate with Unix I/O to construct network applications. For the kernel, socket is the endpoint of communication; For applications, sockets are file descriptors for reading and writing. The client and server communicate by reading and writing the corresponding socket descriptor. The main difference between ordinary file I/O and socket I/O is how the program "opens" the socket descriptor

Server: the line that accepts the connection request and repeated input
Client: request connection from the server, repeat (terminal read, send to the server, read response from the server, print at the terminal)

2.5.1 socket address structure

From the perspective of Linux kernel, a socket is an endpoint of communication. From the perspective of Linux program, a socket is an open file with corresponding descriptor. Universal sockaddr is the parameter of connect, bind and accept. Because C has no universal pointer when it comes to socket interface, sockaddr is necessary.

For a specific socket (IPv4) address, for functions that accept socket address parameters, (struct sockaddr_in *) (Note: _inis the abbreviation of internet, not input) must be converted to (struct SOCKADDR *) as function parameters

2.5.2 socket function

Int socket (int domain, int type, int protocol) the client and server use the socket function to create a socket descriptor. The best practice is to use getaddrinfo to automatically generate parameters so that the code is protocol independent.

2.5.3 connect function

int connect(int clientfd, const struct sockaddr *addr, sockel_t addrlen) the client establishes a connection with the server by calling the connect function. The connect function attempts to establish an Internet connection with the server whose socket address is addr. Addrlen is sizeof(sockaddr_in). The connect function blocks until the connection succeeds or an error occurs. If successful, the clientfd descriptor is ready to read and write, and the resulting link is a socket pair (X: y, addr. Sin_, addr: addr. Sin_port). X represents the IP address of the client, Y represents the temporary port, and uniquely identifies the client process on the client host. For sockets, it is better to use getaddrinfo to provide parameters for connect.

2.5.4 bind function

Int bind (int sockfd, const struct SOCKADDR * addr, socklen_t addrlen) the bind function tells the kernel to associate the server socket address in addr with the socket descriptor sockfd. getaddrinfo is better to provide parameters for bind. The process can read the bytes reaching the connection with the endpoint of addr by reading the descriptor sockfd. Similarly, the writing to sockfd is transmitted along the connection with the endpoint of addr

2.5.5 listen function

By default, the kernel assumes that the descriptor from the socket function is a socket that will be active. The server calls the listen function to tell the kernel that the descriptor is used by the server.

int listen(int sockfd, int backlog) function converts sockfd from an active socket to a listening socket, which can accept connection requests from clients; The backlog parameter indicates the number of outstanding connection requests to be queued in the queue before the kernel starts rejecting connection requests.

2.5.6 accept function

The server calls the accept function to wait for a connection request from the client.

int accept(int listenfd, struct sockaddr *addr, int *addrlen) this function waits for the connection request from the client to reach the listening descriptor listenfd, and then fills in the socket address of the client in addr to return a connected descriptor. This descriptor can be used to communicate with the client with Unix I/O functions.

Among them, the listening descriptor is an endpoint of the client connection request, which is usually created once and exists in the whole life cycle of the server. The listening descriptor is an endpoint of the established connection between the client and the server. Each time the server accepts a connection request, it will be created once, which only exists in the process of the server serving a client.

Note: it seems unnecessary and complicated to distinguish between listening descriptors and connected descriptors, but it makes it possible to establish concurrent servers and handle many client connections at the same time.

2.5.7 conversion of hosts and services

Linux provides some powerful functions getaddrinfo and getnameinfo to realize the conversion between binary socket address structure and string representation of host name, host address, service name and port number. When used with socket interfaces, these functions enable us to write network programs independent of any particular version of IP protocol.

2.5.7.1 getaddrinfo function

A modern method to convert the string representation of host name, host address, port and service name into socket address structure instead of gethostbyname and getservbyname. The advantage is that reentrant can be used safely by threads, allowing the writing of protocol independent and portable code, but it is more complex. Fortunately, a few use modes are enough.

Given the host and service, getaddrinfo returns the result pointing to the linked list of addrinfo structures. Each structure points to the corresponding socket address structure and contains the parameters of the socket interface function. Auxiliary functions are freeadderinfo and gai_strerror

get_ Link table returned by addrinfo

Addrinfo structure. Each addrinfo structure returned by getaddrinfo contains parameters that can be directly passed to the socket function, and also points to the socket address structure that can be directly passed to connect and bind

2.5.7.2 getnameinfo function

getnameinfo is the opposite function of getaddrinfo. It converts the socket address into the corresponding host and service instead of gethostbyaddr and getservport functions. It is also reentrant and protocol independent

3. Simple server implementation

3.1 Architecture Overview

The most important thing to write a server is to clarify our ideas. Last class we introduced many concepts, especially getaddrinfo and getnameinfo, which are essential tools in the process of building. Referring to the above flow chart, the whole workflow has five steps:

Turn on the server

The previous writing method is protocol related. It is recommended to use the parameters generated by getaddrinfo for configuration, which is protocol independent
AF_INET indicates that a 32-bit IPv4 address is being used
int socket(int domain, int type, int protocol)
getaddrinfo: set the relevant information of the server
Socket: create a socket descriptor, that is, the file descriptor used for reading and writing later
For example, int clientfd = socket(AF_INET, SOCK_STREAM, 0);
SOCK_STREAM indicates that this socket will be the endpoint of the connection
Bind: request the kernel to bind the socket address and socket descriptor
Convert sockfd from active socket to listening socket to receive client requests
The value of the backlog indicates how many requests the kernel starts to reject after receiving them (the queue is cached)
accept: start waiting for client requests
``int accept(int listenfd, SA *addr, int *addrlen);`
Wait for the connection bound to listenfd to receive the request, and then write the socket address of the client to addr and the size to addrlen
Listen: the default descriptor obtained from the socket function is active socket (that is, the connection of the client). Call the listen function to tell the kernel that this socket is used by the server
int bind(int sockfd, SA *addr, socklen_t addrlen);
The process can read bytes that arrive on the connection whose endpoint is addr by reading from descriptor sockfd
Similarly, writes to sockfd are transferred along connection whose endpoint is addr
It is better to use the parameters generated by getaddrinfo as addr and addrlen
int listen(int sockfd, int backlog);
Returns a connected descriptor for information transmission (similar to Unix I/O)

Open the client, set the access address and try to connect)

int connect(int clientfd, SA *addr, socklen_t addrlen);
Try to establish a connection with the server at socker address addr
If successful, clientfd can read and write
connection is described by socket pair (X: y, addr.sin_, addr: addr.sin_port)
x is the address of the client, y is the temporary port of the client, and the latter two are the address and port of the server
It is better to use the parameters generated by getaddrinfo as addr and addrlen
getaddrinfo: set the relevant information of the client. See Figure 1 & 2 for details
Socket: create a socket descriptor, that is, the file descriptor used for reading and writing later
connect: the client call is used to establish a connection with the server

Exchange data (mainly a process cycle. When the client writes to the server, it sends a request; when the server writes to the client, it sends a response)

[Client]rio_ Write: write data, which is equivalent to sending a request to the server
[Client]rio_readlineb: reading data is equivalent to receiving a response from the server
[Server]rio_readlineb: reading data is equivalent to receiving a request from the client
[Server]rio_ Write: write data, which is equivalent to sending a response to the client

Close client

[Client]close: close the connection

Disconnect the client (after receiving the EOF message from the client, the service disconnects the existing connection with the client)

[Server]rio_readlineb: receive the connection closing request from the client until EOF is encountered
[Server]close: close the connection with the client

Note: the concept of EOF is confusing. Firstly, there is no EOF character. Secondly, EOF is a condition detected by the kernel. When the program receives the zero return code returned by read, if the current position of the disk file exceeds the file length, when a process closes the end connected to it in the network and the process connected to the other end tries to read the byte after the last byte in the stream, EOF will occur

3.2 Client

It is used to establish a connection with the server, independent of the protocol

int open_clientfd(char *hostname, char *port) {
    int clientfd;
    struct addrinfo hints, *listp, *p;
    //Get a list of potential server address
    memset(&hints, 0, sizeof(struct addrinfo));
    hints.ai_socktype = SOCK_STREAM; // Open a connection
    hints.ai_flags = AI_NUMERICSERV; // using numeric port arguments
    hints.ai_flags |= AI_ADDRCONFIG; // Recommended for connections
    getaddrinfo(hostname, port, &hints, &listp);
    // Walk the list for one that we can successfully connect to
    // If all of them fail, it will finally return failure (there may be multiple addresses)
    for (p = listp; p; p = p->ai_next) {
        // Create a socket descriptor
        // Here, the parameters obtained from getaddrinfo are used to realize protocol independence
        if ((clientfd = socket(p->ai_family, p->ai_socktype,
                               p->ai_protocol)) < 0)
            continue; // Socket failed, try the next
        // Connect to the server
        // Here, the parameters obtained from getaddrinfo are used to realize protocol independence
        if (connect(clientfd, p->ai_addr, p->ai_addrlen) != -1)
            break; // Success
        close(clientfd); // Connect failed, try another
    }
    // Clean up
    freeaddrinfo(listp);
    if (!p) // All connections failed
        return -1;
    else // The last connect succeeded
        return clientfd;
}

3.3 Server

Create a listening descriptor to receive requests from clients, which is protocol independent

int open_listenfd(char *port){
    struct addrinfo hints, *listp, *p;
    int listenfd, optval=1;

    // Get a list of potential server addresses
    memset(&hints, 0, sizeof(struct addrinfo));
    hints.ai_socktype = SOCK_STREAM; // Accept connection
    hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; // on any IP address
    hints.ai_flags |= AI_NUMERICSERV; // using port number
    // Because the server does not need to connect, the original address is NULL
    getaddrinfo(NULL, port, &hints, &listp); 
    // Walk the list for one that we can successfully connect to
    // If all of them fail, it will finally return failure (there may be multiple addresses)
    for (p = listp; p; p = p->ai_next) {
        // Create a socket descriptor
        // Here, the parameters obtained from getaddrinfo are used to realize protocol independence
        if ((listenfd = socket(p->ai_family, p->ai_socktype,
                               p->ai_protocol)) < 0)
            continue; // Socket failed, try the next
        // Eliminates "Address already in use" error from bind
        setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR), 
                    (const void *)&optval, sizeof(int));
        // Bind the descriptor to the address
        if (bind(listenfd, p->ai_addr, p->ai_addrlen) == 0)
            break; // Success
        close(listenfd); // Bind failed, try another
    }
    // Clean up
    freeaddrinfo(listp);
    if (!p) // No address worked
        return -1;
    // Make it a listening socket ready to accept connection requests
    if (listen(listenfd, LISTENQ) < 0) {
        close(listenfd);
        return -1;
    }
    return listenfd;
}

3.4 simple socket server instance

3.4.1 client

What this client does is very simple, that is, send a text entered by the user to the server, and then display the content received from the server to the output. For details, see the notes

// echoclient.c
#include "csapp.h"
int main (int argc, char **argv) {
    int clientfd;
    char *host, *port, buf[MAXLINE];
    rio_t rio;
    host = argv[1];
    port = argv[2];
    // Establish a connection (described in detail earlier)
    clientfd = Open_clientfd(host, port);
    Rio_readinitb(&rio, clientfd);
    while (Fgets(buf, MAXLINE, stdin) != NULL) {
        // Write, that is, send information to the server
        Rio_writen(clientfd, buf, strlen(buf));
        // Read, that is, receive information from the server
        Rio_readlineb(&rio, buf, MAXLINE);
        // Displays the information received from the server in the output
        Fputs(buf, stdout);
    }
    Close(clientfd);
    exit(0);
}

3.4.2 server

As like as two peas, the server does the job simply, receiving the information sent from the client, and then returning to the same thing. Please refer to the notes for details.

// echoserveri.c
#include "csapp.h"
void echo(int connfd);
int main(int argc, char **argv){
    int listenfd, connfd;
    socklen_t clientlen;
    struct sockaddr_storage clientaddr; // Enough room for any addr
    char client_hostname[MAXLINE], client_port[MAXLINE];
    // Turn on the listening port. Be careful to turn it on only once
    listenfd = Open_listenfd(argv[1]);
    while (1) {
        // Specific size required
        clientlen = sizeof(struct sockaddr_storage); // Important!
        // Waiting for connection
        connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);
        // Get client related information
        Getnameinfo((SA *) &clientaddr, clientlen, client_hostname,
                     MAXLINE, client_port, MAXLINE, 0);
        printf("Connected to (%s, %s)\n", client_hostname, client_port);
        // Specific work completed by the server
        echo(coonfd);
        Close(connfd);
    }
    exit(0);
}

void echo(int connfd) {
    size_t n;
    char buf[MAXLINE];
    rio_t rio;
    // Read the data transmitted from the client
    Rio_readinitb(&rio, connfd);
    while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0) {
        printf("server received %d bytes\n", (int)n);
        // Write back the information received from the client
        Rio_writen(connfd, buf, n);
    }
}

3.5 proxy

The agent is the intermediary between the client and the server. For the client, the agent is like the server, and for the server, the agent is like the client. A proxy is needed because it performs useful functions when requests and responses pass. Such as caching, logging, anonymization, filtering and code conversion

4. web server

4.1 basic knowledge of Web

The client and server communicate using hypertext transfer protocol (HTTP). The client establishes a TCP connection with the server, the client requests the content, the server responds to the requested content, and the close connection between the client and the server

4.2 web content

The web server returns the content to the client. The content is a byte sequence with an associated mime (Multipurpose Internet Mail Extension) type. The content is identified by the URL. The MIME types include HTML document, plain text without format and binary picture in GIF format

4.2.1 static and dynamic content

Static content: content stored in files and retrieved in response to HTTP requests, such as HTML files, pictures, videos, js programs
Dynamic content: the content generated in real time in response to HTTP request, the content generated by the program executed by the server on behalf of the client, the request identifies the file containing executable code, and any URL can refer to static or dynamic content

4.2.2 how to use URLs and clients and servers

URL (Universal Resource Locator): a unified resource locator, such as: http://www.cmu.edu:80/index.html Prefix for client http://www.cmu.edu:80 Indicates the protocol type (HTTP), and the server is www.cmu Edu, port number: 80. Server suffix / index HTML determines whether the request is static or dynamic (there is no mandatory provision, but it is usually agreed that the executable file should be placed in the CGI bin directory). The initial letter / in the suffix indicates the home directory of the requested content, the minimum suffix is "/", and the server is extended to the configured default file name index html

4.2.3 HTTP REQUEST RESPONSE

4.2.4 HTTP example

4.2.5 HTTP(S) example

5. Tiny web server

Small web server, 239 lines of C code, but it supports static and dynamic content display, but it is not complete and robust enough as the actual web server. For example, it can be replaced by \ r\n. Accept the connection from the client, read the request from the client (through the connected socket), and divide the URL. If the URI contains "CGI bin", fork will create a sub process to execute the program, otherwise the static content will be copied to the output

Problems of dynamic content display:

How does the client pass program parameters to the server?

How does the server pass these parameters to child processes?

How does the server pass other request related information to the child process?

How does the server capture content generated by child processes?

5.1 CGI

The answer is CGI (Common Gateway Interface). Because subprocesses are written according to the CGI specification, they are usually called CGI programs. CGI defines simple standards for transferring information between clients (browsers), servers, and child processes. CGI is the original standard for generating dynamic content. It has been replaced by other faster technologies, such as fastCGI, Apache modules, Java servlets and Rails controllers, to avoid rushing to create processes

How the client passes the parameters to the server, and the parameters are passed through the URI. Can be directly encoded in a URL input to the browser or a URL in an HTML link, with the parameter? Start, split by &, spaces with + or% 20. Other bytes have similar codes. Note that the parameters in the POST request are passed in the body rather than in the URI (the essence of programming is the visualization of data). Secondly, the server passes the environment variable QUERY_STRING passes the parameter to the child process. The child process generates output on stdout. The server uses dup2 to redirect stdout to the socket to which it is connected to capture the content generated by the child process. Note that only the CGI subprocess knows the content type and length, so it must generate those header files.

5.2 code

5.2.1 main program

tiny is an iterative server that listens for connection requests on ports passed from the command line. Call open_ After opening a listening socket, the listenfd function executes a typical infinite server loop, continuously accepting connection requests (Accept), executing transactions (doit), and closing the other end of the connection (Close).

/*
 * tiny.c - A simple, iterative HTTP/1.0 Web server that uses the 
 *     GET method to serve static and dynamic content.
 */
#include "csapp.h"
void doit(int fd);
void read_requesthdrs(rio_t *rp);
int parse_uri(char *uri, char *filename, char *cgiargs);
void serve_static(int fd, char *filename, int filesize);
void get_filetype(char *filename, char *filetype);
void serve_dynamic(int fd, char *filename, char *cgiargs);
void clienterror(int fd, char *cause, char *errnum, 
         char *shortmsg, char *longmsg);
int main(int argc, char **argv) {
    int listenfd, connfd;
    char hostname[MAXLINE], port[MAXLINE];
    socklen_t clientlen;
    struct sockaddr_storage clientaddr;
    /* Check command line args */
    if (argc != 2) {
        fprintf(stderr, "usage: %s <port>\n", argv[0]);
        exit(1);
    }
    listenfd = Open_listenfd(argv[1]);
    while (1) {
        clientlen = sizeof(clientaddr);
        connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);
        //line:netp:tiny:accept
        Getnameinfo((SA *) &clientaddr, clientlen, hostname, MAXLINE, 
                            port, MAXLINE, 0);
        printf("Accepted connection from (%s, %s)\n", hostname, port);
        doit(connfd);
        //line:netp:tiny:doit
        Close(connfd);
        //line:netp:tiny:close
    }
}

5.2.2 doit function

The URI is then parsed into a file name and a possibly empty CGI parameter string, and is is set_ Static flag, indicating whether the requested content is static or dynamic. If the file does not exist on the disk, an error message is immediately sent to the client and returned. Finally, if the request is for static content, verify that the file is an ordinary file, and those with read permission provide static content (serve_static) to the client. Similarly, if the request is dynamic content, verify that the file is executable, continue, and provide dynamic content (serve_dynamic).

/*
 * doit - handle one HTTP request/response transaction
 */
void doit(int fd) {
    int is_static;
    struct stat sbuf;
    char buf[MAXLINE], method[MAXLINE], uri[MAXLINE], version[MAXLINE];
    char filename[MAXLINE], cgiargs[MAXLINE];
    rio_t rio;
    /* Read request line and headers */
    Rio_readinitb(&rio, fd);
    if (!Rio_readlineb(&rio, buf, MAXLINE))  //line:netp:doit:readrequest
    return;
    printf("%s", buf);
    sscanf(buf, "%s %s %s", method, uri, version);
    //line:netp:doit:parserequest
    if (strcasecmp(method, "GET")) {
        //line:netp:doit:beginrequesterr
        clienterror(fd, method, "501", "Not Implemented",
                            "Tiny does not implement this method");
        return;
    }
    //line:netp:doit:endrequesterr
    read_requesthdrs(&rio);
    //line:netp:doit:readrequesthdrs
    /* Parse URI from GET request */
    is_static = parse_uri(uri, filename, cgiargs);
    //line:netp:doit:staticcheck
    if (stat(filename, &sbuf) < 0) {
        //line:netp:doit:beginnotfound
        clienterror(fd, filename, "404", "Not found",
                    "Tiny couldn't find this file");
        return;
    }
    //line:netp:doit:endnotfound
    if (is_static) {
        /* Serve static content */
        if (!(S_ISREG(sbuf.st_mode)) || !(S_IRUSR & sbuf.st_mode)) {
            //line:netp:doit:readable
            clienterror(fd, filename, "403", "Forbidden",
                        "Tiny couldn't read the file");
            return;
        }
        serve_static(fd, filename, sbuf.st_size);
        //line:netp:doit:servestatic
    } else {
        /* Serve dynamic content */
        if (!(S_ISREG(sbuf.st_mode)) || !(S_IXUSR & sbuf.st_mode)) {
            //line:netp:doit:executable
            clienterror(fd, filename, "403", "Forbidden",
                        "Tiny couldn't run the CGI program");
            return;
        }
        serve_dynamic(fd, filename, cgiargs);
        //line:netp:doit:servedynamic
    }
}

5.2.3 clienterror function

tiny lacks many error handling features of the actual server. However, it checks for obvious errors and reports them to the client. This function sends an HTTP response to the client. The response line contains the corresponding status code and status message. The response body contains an HTML file to explain the error to the browser user.

Recall that the HTML response should indicate the size and type of content in the body. Therefore, choose to create HTML content as a string, so you can simply determine its size. Also, note that all output uses robust Rio_ The writer function.

/*
 * clienterror - returns an error message to the client
 */
void clienterror(int fd, char *cause, char *errnum, 
         char *shortmsg, char *longmsg) {
    char buf[MAXLINE];
    /* Print the HTTP response headers */
    sprintf(buf, "HTTP/1.0 %s %s\r\n", errnum, shortmsg);
    Rio_writen(fd, buf, strlen(buf));
    sprintf(buf, "Content-type: text/html\r\n\r\n");
    Rio_writen(fd, buf, strlen(buf));
    /* Print the HTTP response body */
    sprintf(buf, "<html><title>Tiny Error</title>");
    Rio_writen(fd, buf, strlen(buf));
    sprintf(buf, "<body bgcolor=""ffffff"">\r\n");
    Rio_writen(fd, buf, strlen(buf));
    sprintf(buf, "%s: %s\r\n", errnum, shortmsg);
    Rio_writen(fd, buf, strlen(buf));
    sprintf(buf, "<p>%s: %s\r\n", longmsg, cause);
    Rio_writen(fd, buf, strlen(buf));
    sprintf(buf, "<hr><em>The Tiny Web server</em>\r\n");
    Rio_writen(fd, buf, strlen(buf));
}

5.2.4 read_requesthdrs function

tiny does not use any information in the request header. It only calls read_ The requesthdrs function to read and ignore these headers. Note that the blank line of the termination request header consists of carriage return and line feed pairs

/*
 * read_requesthdrs - read HTTP request headers
 */
void read_requesthdrs(rio_t *rp) {
    char buf[MAXLINE];
    Rio_readlineb(rp, buf, MAXLINE);
    printf("%s", buf);
    while(strcmp(buf, "\r\n")) {
        //line:netp:readhdrs:checkterm
        Rio_readlineb(rp, buf, MAXLINE);
        printf("%s", buf);
    }
    return;
}

5.2.5 parse_uri function

tiny assumes that the home directory of static content is its current directory, while the home directory of executable files is/ cgi-bin. Any URI containing the string CGI bin is considered to represent a request for dynamic content. The default file name is/ home.html. This function parses the URI into a file name and an optional CGI parameter string. If static content is requested, the CGI parameter string is cleared and the URI is converted to a Linux relative pathname, for example/ index.html. If the URI ends with "/", the default file name will be appended. On the other hand, if the request is dynamic content, all CGI parameters will be extracted and the rest of the URI will be converted into a Linux relative file name.

/*
 * parse_uri - parse URI into filename and CGI args
 *             return 0 if dynamic content, 1 if static
 */
int parse_uri(char *uri, char *filename, char *cgiargs) {
    char *ptr;
    if (!strstr(uri, "cgi-bin")) {
        /* Static content */
        //line:netp:parseuri:isstatic
        strcpy(cgiargs, "");
        //line:netp:parseuri:clearcgi
        strcpy(filename, ".");
        //line:netp:parseuri:beginconvert1
        strcat(filename, uri);
        //line:netp:parseuri:endconvert1
        if (uri[strlen(uri)-1] == '/')                   //line:netp:parseuri:slashcheck
        strcat(filename, "home.html");
        //line:netp:parseuri:appenddefault
        return 1;
    } else {
        /* Dynamic content */
        //line:netp:parseuri:isdynamic
        ptr = index(uri, '?');
        //line:netp:parseuri:beginextract
        if (ptr) {
            strcpy(cgiargs, ptr+1);
            *ptr = '\0';
        } else 
                strcpy(cgiargs, "");
        //line:netp:parseuri:endextract
        strcpy(filename, ".");
        //line:netp:parseuri:beginconvert2
        strcat(filename, uri);
        //line:netp:parseuri:endconvert2
        return 0;
    }
}

5.2.6 serve_static function

tiny provides five common types of static content: HTML files, unformatted text files, and images encoded in GIF, PNG, and JPG formats. Function sends an HTTP response whose body contains the contents of a local file. First, determine the file type by checking the suffix of the file name, and send the response line and response header to the client. Note that the header is terminated with a blank line.

/*
 * serve_static - copy a file back to the client 
 */
void serve_static(int fd, char *filename, int filesize) {
    int srcfd;
    char *srcp, filetype[MAXLINE], buf[MAXBUF];
    /* Send response headers to client */
    get_filetype(filename, filetype);
    //line:netp:servestatic:getfiletype
    sprintf(buf, "HTTP/1.0 200 OK\r\n");
    //line:netp:servestatic:beginserve
    Rio_writen(fd, buf, strlen(buf));
    sprintf(buf, "Server: Tiny Web Server\r\n");
    Rio_writen(fd, buf, strlen(buf));
    sprintf(buf, "Content-length: %d\r\n", filesize);
    Rio_writen(fd, buf, strlen(buf));
    sprintf(buf, "Content-type: %s\r\n\r\n", filetype);
    Rio_writen(fd, buf, strlen(buf));
    //line:netp:servestatic:endserve
    /* Send response body to client */
    srcfd = Open(filename, O_RDONLY, 0);
    srcp = Mmap(0, filesize, PROT_READ, MAP_PRIVATE, srcfd, 0);
    Close(srcfd);
    Rio_writen(fd, srcp, filesize);
    Munmap(srcp, filesize);
}
/*
 * get_filetype - derive file type from file name
 */
void get_filetype(char *filename, char *filetype) {
    if (strstr(filename, ".html"))
        strcpy(filetype, "text/html"); else if (strstr(filename, ".gif"))
        strcpy(filetype, "image/gif"); else if (strstr(filename, ".png"))
        strcpy(filetype, "image/png"); else if (strstr(filename, ".jpg"))
        strcpy(filetype, "image/jpeg"); else
        strcpy(filetype, "text/plain");
}

Next, the content of the requested file is copied to the connected descriptor fd to send the response body. The code here is subtle and needs to be studied carefully. Open filename (open) in read mode and get its descriptor. The Linux mmap function maps the requested file to a virtual memory space. Call mmap to map the first filesize bytes of the file srcfd to a private read-only virtual memory area starting from the address srcp. Once the file is mapped to memory, no descriptor is required, so Close the file. If the execution fails, it will lead to a potentially fatal memory leak. Then the actual file transfer to the client, and finally release the mapped virtual memory area to avoid potentially fatal memory leakage

5.2.7 serve_dynamic function

tiny provides various types of dynamic content by deriving a child process and running a CGI program in the context of the child process. serve_ The dynamic function initially sends a response line indicating success to the client, including a Server header with information. The CGI program is responsible for sending the rest of the response. Note that it does not take into account the possibility that the CGI program will encounter some errors.

/*
 * serve_dynamic - run a CGI program on behalf of the client
 */
void serve_dynamic(int fd, char *filename, char *cgiargs) {
    char buf[MAXLINE], *emptylist[] = {
        NULL
    }
    ;
    /* Return first part of HTTP response */
    sprintf(buf, "HTTP/1.0 200 OK\r\n");
    Rio_writen(fd, buf, strlen(buf));
    sprintf(buf, "Server: Tiny Web Server\r\n");
    Rio_writen(fd, buf, strlen(buf));
    if (Fork() == 0) {/* Child */
        /* Real server would set all CGI vars here */
        setenv("QUERY_STRING", cgiargs, 1);
        /* Redirect stdout to client */
        Dup2(fd, STDOUT_FILENO);
        /* Run CGI program */
        Execve(filename, emptylist, environ);
    }
    Wait(NULL);/* Parent waits for and reaps child */
}

After sending the response, a new child process is derived. The subprocess initializes the query with the CGI parameter from the request URI_ String environment variable. Note that a real server will also set other CGI environment variables here. The subprocess then redirects its standard output to the connected file descriptor, and then loads and runs the CGI program. Because the CGI program runs in the context of a child process, it can access all open files and environment variables that existed before the execve function was called. Therefore, anything written by the CGI program to the standard output will be sent directly to the client process without any interference from the parent process. Meanwhile, the parent process is blocked in the call to wait. When the child process terminates, it will recover the resources allocated by the operating system to the child process

6. Summary

The client and server establish a connection by using a socket interface. A socket is an endpoint of a connection, which is provided to the application in the form of a file descriptor. Communication using socket interface

The Web server uses the HTTP protocol to communicate with clients, such as browsers. The browser requests static or dynamic content from the server. The request for static content is served by obtaining the file from the server disk and returning it to the client. Requests for dynamic content are served by running a program in the context of a child process on the server and returning its output to the client. CGI standard provides a set of rules to manage how the client passes program parameters to the server, how the server passes these parameters and other information to the child process, and how the child process sends its output back to the client. Finally, a simple but effective Web server is implemented, which can provide both static and dynamic content.

Topics: network server Network Protocol csapp

Programmer Think