1. Network architecture
You need to understand the basic client server programming model and how to write client server programs that use services provided by the Internet. Finally, combine all the concepts to develop a small but full-featured web server. Nowadays, network applications can be seen everywhere. Interestingly, all network applications are based on the same basic programming model, have similar overall logical structure and rely on the same programming interface. Most network applications are based on the client server model, one server process and one or more client processes (it is important that the client and server are processes rather than machines or hosts), the server manages resources, the server provides services for the client, and the client requests the server. Generally speaking, there are four steps: request, operation, response and processing.
The client process sends a request to the server process. The server process obtains the required resources and responds to the request of the client process. After receiving the response, the client process displays it to the user. For the host, the network is just an I/O device, which is the data source and data receiver. The network related processing is completed through the network adapter. The specific hardware is (lower right corner):
2. Network
According to the application scope and architecture level of the network, it can be divided into three parts:
-
SAN - System Area Network
-
LAN - Local Area Network
-
WAN - Wide Area Network
2.1 bottom layer - Ethernet Segment
An Ethernet Segment is a small box containing cables and hubs. Several hosts are connected through a switch (hub), usually in a room or on the first floor, as shown in the figure:
-
The Ethernet section consists of a group of hosts, which are connected to a hub through network cable (twisted pair)
-
Each Ethernet adapter has a unique 48 bit MAC address, and the host sends bits to other hosts in the form of frames
-
Hub will copy the data sent from each port to other ports, and all hosts can see all the data (pay attention to security issues)
2.2 next floor
2.2.1 Bridged Ethernet Segment
The bridging ethernet segment usually covers one floor and connects different Ethernet segments through different bridges. Bridge knows the hosts that can be reached from a certain port and selectively copies data between ports.
For simplicity, it can be considered that all hubs, bridges and wires can be abstracted as a collection of hosts connected in one line, as shown in the following figure:
2.2.2 internets
Multiple incompatible LAN s can be physically connected through router. The connected network is called internet (note that it is in lowercase)
The logical structure of internet is:
-
Ad hoc interconnection of networks
-
There is no specific topology
-
Different router and link may vary greatly
-
Transfer packet s by jumping between different networks
-
A Router is a connection between different networks
-
Different packet s may take different routes
2.3 network protocol
When transmitting data in different LAN s and WAN s, the rule to follow is protocol. Protocol is a set of rules that manage how hosts and routers cooperate when transmitting data between networks, eliminating the differences between different networks. The responsible things to do are:
-
Provide naming mechanism: define the unified format of host address, and each host and router has at least one independent internet address ID
-
Provide transmission mechanism: define the standard transmission unit Packet, which includes header and payload, and header includes packet size, source and destination address. Payload includes the data to be transmitted
The specific data transmission is shown in the figure below, where pH = Internet packet header and FH = LAN frame header
Eight basic steps for host A to send data to B:
-
The client on A makes A system call and copies data from the client virtual address space to the kernel buffer
-
The protocol software on A creates A LAN1 frame by attaching A packet header and A LAN1 frame header before the data. The packet header is addressed to B, and the LAN1 frame header is addressed to the router, and then the frame is transmitted to the adapter. Note that LAN1 frame payload is an interconnection network packet, and the packet payload is actual user data. This encapsulation is the basic network interconnection method
-
The LAN1 adapter copies the frame to the network
-
When this frame reaches the router, the LAN1 adapter of the router reads it from the cable and transmits it to the protocol software
-
The router takes the destination address from the packet header and uses it as the index of the routing table to determine where to forward the packet. This example is LAN2. The router peels off the old LAN1 frame header and adds the new LAN2 frame header addressed to B, and transmits the obtained frame to the adapter
-
The LAN2 adapter of the router copies the frame to the network
-
When this frame reaches B, its adapter reads this frame from the cable and transmits it to the protocol software
-
The protocol software on B peels off the packet header and frame header. When the server makes a system call to read these data, the protocol software finally copies the obtained data to the virtual address space of the server
Other questions:
What if different networks have different maximum frame sizes?
How does the router know where to forward frames?
How to notify the router when the network topology changes?
What if the packet is lost?
These (and other) problems are solved by the field of systems called computer networks
2.4 TCP/IP protocol
The Internet is the most famous example of the Internet. Mainly based on TCP/IP protocol family:
-
IP (Internet Protocal): provides the basic naming scheme of host and the unreliable delivery ability of host to host data packets
-
UDP (Unreliable Datagram Protocol): use IP to provide unreliable datagrams between processes
-
TCP (Transmission Control Protocol): use IP to provide reliable byte flow between processes
Access by mixing Unix I/O and socket interface functions
-
The host has a 32-bit IP address - 23.235.46.133
-
An IP address maps to an identifier called an Internet domain name
-
Processes between different hosts can exchange data through connections
2.4.1 IP address
It is stored with IP address struct, and the IP address is stored in the big end
// Internet address structure struct in_addr { uint32_t s_addr; // network byte order (big-endian) }
In order to facilitate reading, the IP address is generally expressed in the following form: 0x8002C2F2 = 128.2.194.242. The specific conversion can use getaddrinfo and getnameinfo functions
2.4.2 Internet host domain name
Mainly understand the concept of Domain Naming System(DNS), which is used to map IP address to domain name. Programmers can think of a DNS database as a collection of millions of host entries. Each host has a locally defined domain name localhost, which is always mapped to the loopback address
$ nslookup www.twitter.com Server: 8.8.8.8 Address: 8.8.8.8#53 Non-authoritative answer: www.twitter.com canonical name = twitter.com. Name: twitter.com Address: 199.16.156.6 Name: twitter.com Address: 199.16.156.198 Name: twitter.com Address: 199.16.156.230 Name: twitter.com Address: 199.16.156.70
Use hostname to determine the real domain name of the local host. One or more domain names can be mapped to the same IP address, and multiple domain names correspond to multiple IPS
2.4.3 Internet connection
The client and server send byte stream through connection, which is characterized by:
-
Point to point: connect a pair of processes
-
Full duplex: data can flow in both directions at the same time
-
Reliable: the sequence of bytes sent is consistent with that received
Socket can be regarded as the endpoint of connection, and the socket address is an IPaddress:port pair.
Port is a 16 bit integer used to identify different processes and use different ports to connect different services:
-
Well-known port: Associated with some
service
provided by a server
/etc/services
(see specific information in)
-
echo server: 7/echo
-
ssh server: 22/ssh
-
email server: 25/smtp
-
web servers: 80/http
connect
2.5 Socket interface
A series of system level functions cooperate with Unix I/O to construct network applications. For the kernel, socket is the endpoint of communication; For applications, sockets are file descriptors for reading and writing. The client and server communicate by reading and writing the corresponding socket descriptor. The main difference between ordinary file I/O and socket I/O is how the program "opens" the socket descriptor
-
Server: the line that accepts the connection request and repeated input
-
Client: request connection from the server, repeat (terminal read, send to the server, read response from the server, print at the terminal)
2.5.1 socket address structure
From the perspective of Linux kernel, a socket is an endpoint of communication. From the perspective of Linux program, a socket is an open file with corresponding descriptor. Universal sockaddr is the parameter of connect, bind and accept. Because C has no universal pointer when it comes to socket interface, sockaddr is necessary.
For a specific socket (IPv4) address, for functions that accept socket address parameters, (struct sockaddr_in *) (Note: _inis the abbreviation of internet, not input) must be converted to (struct SOCKADDR *) as function parameters
2.5.2 socket function
Int socket (int domain, int type, int protocol) the client and server use the socket function to create a socket descriptor. The best practice is to use getaddrinfo to automatically generate parameters so that the code is protocol independent.
2.5.3 connect function
int connect(int clientfd, const struct sockaddr *addr, sockel_t addrlen) the client establishes a connection with the server by calling the connect function. The connect function attempts to establish an Internet connection with the server whose socket address is addr. Addrlen is sizeof(sockaddr_in). The connect function blocks until the connection succeeds or an error occurs. If successful, the clientfd descriptor is ready to read and write, and the resulting link is a socket pair (X: y, addr. Sin_, addr: addr. Sin_port). X represents the IP address of the client, Y represents the temporary port, and uniquely identifies the client process on the client host. For sockets, it is better to use getaddrinfo to provide parameters for connect.
2.5.4 bind function
Int bind (int sockfd, const struct SOCKADDR * addr, socklen_t addrlen) the bind function tells the kernel to associate the server socket address in addr with the socket descriptor sockfd. getaddrinfo is better to provide parameters for bind. The process can read the bytes reaching the connection with the endpoint of addr by reading the descriptor sockfd. Similarly, the writing to sockfd is transmitted along the connection with the endpoint of addr
2.5.5 listen function
By default, the kernel assumes that the descriptor from the socket function is a socket that will be active. The server calls the listen function to tell the kernel that the descriptor is used by the server.
int listen(int sockfd, int backlog) function converts sockfd from an active socket to a listening socket, which can accept connection requests from clients; The backlog parameter indicates the number of outstanding connection requests to be queued in the queue before the kernel starts rejecting connection requests.
2.5.6 accept function
The server calls the accept function to wait for a connection request from the client.
int accept(int listenfd, struct sockaddr *addr, int *addrlen) this function waits for the connection request from the client to reach the listening descriptor listenfd, and then fills in the socket address of the client in addr to return a connected descriptor. This descriptor can be used to communicate with the client with Unix I/O functions.
Among them, the listening descriptor is an endpoint of the client connection request, which is usually created once and exists in the whole life cycle of the server. The listening descriptor is an endpoint of the established connection between the client and the server. Each time the server accepts a connection request, it will be created once, which only exists in the process of the server serving a client.
Note: it seems unnecessary and complicated to distinguish between listening descriptors and connected descriptors, but it makes it possible to establish concurrent servers and handle many client connections at the same time.
2.5.7 conversion of hosts and services
Linux provides some powerful functions getaddrinfo and getnameinfo to realize the conversion between binary socket address structure and string representation of host name, host address, service name and port number. When used with socket interfaces, these functions enable us to write network programs independent of any particular version of IP protocol.
2.5.7.1 getaddrinfo function
A modern method to convert the string representation of host name, host address, port and service name into socket address structure instead of gethostbyname and getservbyname. The advantage is that reentrant can be used safely by threads, allowing the writing of protocol independent and portable code, but it is more complex. Fortunately, a few use modes are enough.
Given the host and service, getaddrinfo returns the result pointing to the linked list of addrinfo structures. Each structure points to the corresponding socket address structure and contains the parameters of the socket interface function. Auxiliary functions are freeadderinfo and gai_strerror
get_ Link table returned by addrinfo
Addrinfo structure. Each addrinfo structure returned by getaddrinfo contains parameters that can be directly passed to the socket function, and also points to the socket address structure that can be directly passed to connect and bind
2.5.7.2 getnameinfo function
getnameinfo is the opposite function of getaddrinfo. It converts the socket address into the corresponding host and service instead of gethostbyaddr and getservport functions. It is also reentrant and protocol independent
3. Simple server implementation
3.1 Architecture Overview
The most important thing to write a server is to clarify our ideas. Last class we introduced many concepts, especially getaddrinfo and getnameinfo, which are essential tools in the process of building. Referring to the above flow chart, the whole workflow has five steps:
- Turn on the server
-
The previous writing method is protocol related. It is recommended to use the parameters generated by getaddrinfo for configuration, which is protocol independent
-
AF_INET indicates that a 32-bit IPv4 address is being used
-
int socket(int domain, int type, int protocol)
-
getaddrinfo: set the relevant information of the server
-
Socket: create a socket descriptor, that is, the file descriptor used for reading and writing later
-
For example, int clientfd = socket(AF_INET, SOCK_STREAM, 0);
-
SOCK_STREAM indicates that this socket will be the endpoint of the connection
-
Bind: request the kernel to bind the socket address and socket descriptor
-
Convert sockfd from active socket to listening socket to receive client requests
-
The value of the backlog indicates how many requests the kernel starts to reject after receiving them (the queue is cached)
-
accept: start waiting for client requests
-
``int accept(int listenfd, SA *addr, int *addrlen);`
-
Wait for the connection bound to listenfd to receive the request, and then write the socket address of the client to addr and the size to addrlen
-
Listen: the default descriptor obtained from the socket function is active socket (that is, the connection of the client). Call the listen function to tell the kernel that this socket is used by the server
-
int bind(int sockfd, SA *addr, socklen_t addrlen);
-
The process can read bytes that arrive on the connection whose endpoint is addr by reading from descriptor sockfd
-
Similarly, writes to sockfd are transferred along connection whose endpoint is addr
-
It is better to use the parameters generated by getaddrinfo as addr and addrlen
-
int listen(int sockfd, int backlog);
-
Returns a connected descriptor for information transmission (similar to Unix I/O)
- Open the client, set the access address and try to connect)
-
int connect(int clientfd, SA *addr, socklen_t addrlen);
-
Try to establish a connection with the server at socker address addr
-
If successful, clientfd can read and write
-
connection is described by socket pair (X: y, addr.sin_, addr: addr.sin_port)
-
x is the address of the client, y is the temporary port of the client, and the latter two are the address and port of the server
-
It is better to use the parameters generated by getaddrinfo as addr and addrlen
-
getaddrinfo: set the relevant information of the client. See Figure 1 & 2 for details
-
Socket: create a socket descriptor, that is, the file descriptor used for reading and writing later
-
connect: the client call is used to establish a connection with the server
- Exchange data (mainly a process cycle. When the client writes to the server, it sends a request; when the server writes to the client, it sends a response)
-
[Client]rio_ Write: write data, which is equivalent to sending a request to the server
-
[Client]rio_readlineb: reading data is equivalent to receiving a response from the server
-
[Server]rio_readlineb: reading data is equivalent to receiving a request from the client
-
[Server]rio_ Write: write data, which is equivalent to sending a response to the client
- Close client
- [Client]close: close the connection
- Disconnect the client (after receiving the EOF message from the client, the service disconnects the existing connection with the client)
-
[Server]rio_readlineb: receive the connection closing request from the client until EOF is encountered
-
[Server]close: close the connection with the client
Note: the concept of EOF is confusing. Firstly, there is no EOF character. Secondly, EOF is a condition detected by the kernel. When the program receives the zero return code returned by read, if the current position of the disk file exceeds the file length, when a process closes the end connected to it in the network and the process connected to the other end tries to read the byte after the last byte in the stream, EOF will occur
3.2 Client
It is used to establish a connection with the server, independent of the protocol
int open_clientfd(char *hostname, char *port) { int clientfd; struct addrinfo hints, *listp, *p; //Get a list of potential server address memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_socktype = SOCK_STREAM; // Open a connection hints.ai_flags = AI_NUMERICSERV; // using numeric port arguments hints.ai_flags |= AI_ADDRCONFIG; // Recommended for connections getaddrinfo(hostname, port, &hints, &listp); // Walk the list for one that we can successfully connect to // If all of them fail, it will finally return failure (there may be multiple addresses) for (p = listp; p; p = p->ai_next) { // Create a socket descriptor // Here, the parameters obtained from getaddrinfo are used to realize protocol independence if ((clientfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0) continue; // Socket failed, try the next // Connect to the server // Here, the parameters obtained from getaddrinfo are used to realize protocol independence if (connect(clientfd, p->ai_addr, p->ai_addrlen) != -1) break; // Success close(clientfd); // Connect failed, try another } // Clean up freeaddrinfo(listp); if (!p) // All connections failed return -1; else // The last connect succeeded return clientfd; }
3.3 Server
Create a listening descriptor to receive requests from clients, which is protocol independent
int open_listenfd(char *port){ struct addrinfo hints, *listp, *p; int listenfd, optval=1; // Get a list of potential server addresses memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_socktype = SOCK_STREAM; // Accept connection hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; // on any IP address hints.ai_flags |= AI_NUMERICSERV; // using port number // Because the server does not need to connect, the original address is NULL getaddrinfo(NULL, port, &hints, &listp); // Walk the list for one that we can successfully connect to // If all of them fail, it will finally return failure (there may be multiple addresses) for (p = listp; p; p = p->ai_next) { // Create a socket descriptor // Here, the parameters obtained from getaddrinfo are used to realize protocol independence if ((listenfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0) continue; // Socket failed, try the next // Eliminates "Address already in use" error from bind setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR), (const void *)&optval, sizeof(int)); // Bind the descriptor to the address if (bind(listenfd, p->ai_addr, p->ai_addrlen) == 0) break; // Success close(listenfd); // Bind failed, try another } // Clean up freeaddrinfo(listp); if (!p) // No address worked return -1; // Make it a listening socket ready to accept connection requests if (listen(listenfd, LISTENQ) < 0) { close(listenfd); return -1; } return listenfd; }
3.4 simple socket server instance
3.4.1 client
What this client does is very simple, that is, send a text entered by the user to the server, and then display the content received from the server to the output. For details, see the notes
// echoclient.c #include "csapp.h" int main (int argc, char **argv) { int clientfd; char *host, *port, buf[MAXLINE]; rio_t rio; host = argv[1]; port = argv[2]; // Establish a connection (described in detail earlier) clientfd = Open_clientfd(host, port); Rio_readinitb(&rio, clientfd); while (Fgets(buf, MAXLINE, stdin) != NULL) { // Write, that is, send information to the server Rio_writen(clientfd, buf, strlen(buf)); // Read, that is, receive information from the server Rio_readlineb(&rio, buf, MAXLINE); // Displays the information received from the server in the output Fputs(buf, stdout); } Close(clientfd); exit(0); }
3.4.2 server
As like as two peas, the server does the job simply, receiving the information sent from the client, and then returning to the same thing. Please refer to the notes for details.
// echoserveri.c #include "csapp.h" void echo(int connfd); int main(int argc, char **argv){ int listenfd, connfd; socklen_t clientlen; struct sockaddr_storage clientaddr; // Enough room for any addr char client_hostname[MAXLINE], client_port[MAXLINE]; // Turn on the listening port. Be careful to turn it on only once listenfd = Open_listenfd(argv[1]); while (1) { // Specific size required clientlen = sizeof(struct sockaddr_storage); // Important! // Waiting for connection connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen); // Get client related information Getnameinfo((SA *) &clientaddr, clientlen, client_hostname, MAXLINE, client_port, MAXLINE, 0); printf("Connected to (%s, %s)\n", client_hostname, client_port); // Specific work completed by the server echo(coonfd); Close(connfd); } exit(0); } void echo(int connfd) { size_t n; char buf[MAXLINE]; rio_t rio; // Read the data transmitted from the client Rio_readinitb(&rio, connfd); while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0) { printf("server received %d bytes\n", (int)n); // Write back the information received from the client Rio_writen(connfd, buf, n); } }
3.5 proxy
The agent is the intermediary between the client and the server. For the client, the agent is like the server, and for the server, the agent is like the client. A proxy is needed because it performs useful functions when requests and responses pass. Such as caching, logging, anonymization, filtering and code conversion
4. web server
4.1 basic knowledge of Web
The client and server communicate using hypertext transfer protocol (HTTP). The client establishes a TCP connection with the server, the client requests the content, the server responds to the requested content, and the close connection between the client and the server
4.2 web content
The web server returns the content to the client. The content is a byte sequence with an associated mime (Multipurpose Internet Mail Extension) type. The content is identified by the URL. The MIME types include HTML document, plain text without format and binary picture in GIF format
4.2.1 static and dynamic content
-
Static content: content stored in files and retrieved in response to HTTP requests, such as HTML files, pictures, videos, js programs
-
Dynamic content: the content generated in real time in response to HTTP request, the content generated by the program executed by the server on behalf of the client, the request identifies the file containing executable code, and any URL can refer to static or dynamic content
4.2.2 how to use URLs and clients and servers
URL (Universal Resource Locator): a unified resource locator, such as: http://www.cmu.edu:80/index.html Prefix for client http://www.cmu.edu:80 Indicates the protocol type (HTTP), and the server is www.cmu Edu, port number: 80. Server suffix / index HTML determines whether the request is static or dynamic (there is no mandatory provision, but it is usually agreed that the executable file should be placed in the CGI bin directory). The initial letter / in the suffix indicates the home directory of the requested content, the minimum suffix is "/", and the server is extended to the configured default file name index html
4.2.3 HTTP REQUEST RESPONSE
4.2.4 HTTP example
4.2.5 HTTP(S) example
5. Tiny web server
Small web server, 239 lines of C code, but it supports static and dynamic content display, but it is not complete and robust enough as the actual web server. For example, it can be replaced by \ r\n. Accept the connection from the client, read the request from the client (through the connected socket), and divide the URL. If the URI contains "CGI bin", fork will create a sub process to execute the program, otherwise the static content will be copied to the output
Problems of dynamic content display:
How does the client pass program parameters to the server?
How does the server pass these parameters to child processes?
How does the server pass other request related information to the child process?
How does the server capture content generated by child processes?
5.1 CGI
The answer is CGI (Common Gateway Interface). Because subprocesses are written according to the CGI specification, they are usually called CGI programs. CGI defines simple standards for transferring information between clients (browsers), servers, and child processes. CGI is the original standard for generating dynamic content. It has been replaced by other faster technologies, such as fastCGI, Apache modules, Java servlets and Rails controllers, to avoid rushing to create processes
How the client passes the parameters to the server, and the parameters are passed through the URI. Can be directly encoded in a URL input to the browser or a URL in an HTML link, with the parameter? Start, split by &, spaces with + or% 20. Other bytes have similar codes. Note that the parameters in the POST request are passed in the body rather than in the URI (the essence of programming is the visualization of data). Secondly, the server passes the environment variable QUERY_STRING passes the parameter to the child process. The child process generates output on stdout. The server uses dup2 to redirect stdout to the socket to which it is connected to capture the content generated by the child process. Note that only the CGI subprocess knows the content type and length, so it must generate those header files.
5.2 code
5.2.1 main program
tiny is an iterative server that listens for connection requests on ports passed from the command line. Call open_ After opening a listening socket, the listenfd function executes a typical infinite server loop, continuously accepting connection requests (Accept), executing transactions (doit), and closing the other end of the connection (Close).
/* * tiny.c - A simple, iterative HTTP/1.0 Web server that uses the * GET method to serve static and dynamic content. */ #include "csapp.h" void doit(int fd); void read_requesthdrs(rio_t *rp); int parse_uri(char *uri, char *filename, char *cgiargs); void serve_static(int fd, char *filename, int filesize); void get_filetype(char *filename, char *filetype); void serve_dynamic(int fd, char *filename, char *cgiargs); void clienterror(int fd, char *cause, char *errnum, char *shortmsg, char *longmsg); int main(int argc, char **argv) { int listenfd, connfd; char hostname[MAXLINE], port[MAXLINE]; socklen_t clientlen; struct sockaddr_storage clientaddr; /* Check command line args */ if (argc != 2) { fprintf(stderr, "usage: %s <port>\n", argv[0]); exit(1); } listenfd = Open_listenfd(argv[1]); while (1) { clientlen = sizeof(clientaddr); connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen); //line:netp:tiny:accept Getnameinfo((SA *) &clientaddr, clientlen, hostname, MAXLINE, port, MAXLINE, 0); printf("Accepted connection from (%s, %s)\n", hostname, port); doit(connfd); //line:netp:tiny:doit Close(connfd); //line:netp:tiny:close } }
5.2.2 doit function
The URI is then parsed into a file name and a possibly empty CGI parameter string, and is is set_ Static flag, indicating whether the requested content is static or dynamic. If the file does not exist on the disk, an error message is immediately sent to the client and returned. Finally, if the request is for static content, verify that the file is an ordinary file, and those with read permission provide static content (serve_static) to the client. Similarly, if the request is dynamic content, verify that the file is executable, continue, and provide dynamic content (serve_dynamic).
/* * doit - handle one HTTP request/response transaction */ void doit(int fd) { int is_static; struct stat sbuf; char buf[MAXLINE], method[MAXLINE], uri[MAXLINE], version[MAXLINE]; char filename[MAXLINE], cgiargs[MAXLINE]; rio_t rio; /* Read request line and headers */ Rio_readinitb(&rio, fd); if (!Rio_readlineb(&rio, buf, MAXLINE)) //line:netp:doit:readrequest return; printf("%s", buf); sscanf(buf, "%s %s %s", method, uri, version); //line:netp:doit:parserequest if (strcasecmp(method, "GET")) { //line:netp:doit:beginrequesterr clienterror(fd, method, "501", "Not Implemented", "Tiny does not implement this method"); return; } //line:netp:doit:endrequesterr read_requesthdrs(&rio); //line:netp:doit:readrequesthdrs /* Parse URI from GET request */ is_static = parse_uri(uri, filename, cgiargs); //line:netp:doit:staticcheck if (stat(filename, &sbuf) < 0) { //line:netp:doit:beginnotfound clienterror(fd, filename, "404", "Not found", "Tiny couldn't find this file"); return; } //line:netp:doit:endnotfound if (is_static) { /* Serve static content */ if (!(S_ISREG(sbuf.st_mode)) || !(S_IRUSR & sbuf.st_mode)) { //line:netp:doit:readable clienterror(fd, filename, "403", "Forbidden", "Tiny couldn't read the file"); return; } serve_static(fd, filename, sbuf.st_size); //line:netp:doit:servestatic } else { /* Serve dynamic content */ if (!(S_ISREG(sbuf.st_mode)) || !(S_IXUSR & sbuf.st_mode)) { //line:netp:doit:executable clienterror(fd, filename, "403", "Forbidden", "Tiny couldn't run the CGI program"); return; } serve_dynamic(fd, filename, cgiargs); //line:netp:doit:servedynamic } }
5.2.3 clienterror function
tiny lacks many error handling features of the actual server. However, it checks for obvious errors and reports them to the client. This function sends an HTTP response to the client. The response line contains the corresponding status code and status message. The response body contains an HTML file to explain the error to the browser user.
Recall that the HTML response should indicate the size and type of content in the body. Therefore, choose to create HTML content as a string, so you can simply determine its size. Also, note that all output uses robust Rio_ The writer function.
/* * clienterror - returns an error message to the client */ void clienterror(int fd, char *cause, char *errnum, char *shortmsg, char *longmsg) { char buf[MAXLINE]; /* Print the HTTP response headers */ sprintf(buf, "HTTP/1.0 %s %s\r\n", errnum, shortmsg); Rio_writen(fd, buf, strlen(buf)); sprintf(buf, "Content-type: text/html\r\n\r\n"); Rio_writen(fd, buf, strlen(buf)); /* Print the HTTP response body */ sprintf(buf, "<html><title>Tiny Error</title>"); Rio_writen(fd, buf, strlen(buf)); sprintf(buf, "<body bgcolor=""ffffff"">\r\n"); Rio_writen(fd, buf, strlen(buf)); sprintf(buf, "%s: %s\r\n", errnum, shortmsg); Rio_writen(fd, buf, strlen(buf)); sprintf(buf, "<p>%s: %s\r\n", longmsg, cause); Rio_writen(fd, buf, strlen(buf)); sprintf(buf, "<hr><em>The Tiny Web server</em>\r\n"); Rio_writen(fd, buf, strlen(buf)); }
5.2.4 read_requesthdrs function
tiny does not use any information in the request header. It only calls read_ The requesthdrs function to read and ignore these headers. Note that the blank line of the termination request header consists of carriage return and line feed pairs
/* * read_requesthdrs - read HTTP request headers */ void read_requesthdrs(rio_t *rp) { char buf[MAXLINE]; Rio_readlineb(rp, buf, MAXLINE); printf("%s", buf); while(strcmp(buf, "\r\n")) { //line:netp:readhdrs:checkterm Rio_readlineb(rp, buf, MAXLINE); printf("%s", buf); } return; }
5.2.5 parse_uri function
tiny assumes that the home directory of static content is its current directory, while the home directory of executable files is/ cgi-bin. Any URI containing the string CGI bin is considered to represent a request for dynamic content. The default file name is/ home.html. This function parses the URI into a file name and an optional CGI parameter string. If static content is requested, the CGI parameter string is cleared and the URI is converted to a Linux relative pathname, for example/ index.html. If the URI ends with "/", the default file name will be appended. On the other hand, if the request is dynamic content, all CGI parameters will be extracted and the rest of the URI will be converted into a Linux relative file name.
/* * parse_uri - parse URI into filename and CGI args * return 0 if dynamic content, 1 if static */ int parse_uri(char *uri, char *filename, char *cgiargs) { char *ptr; if (!strstr(uri, "cgi-bin")) { /* Static content */ //line:netp:parseuri:isstatic strcpy(cgiargs, ""); //line:netp:parseuri:clearcgi strcpy(filename, "."); //line:netp:parseuri:beginconvert1 strcat(filename, uri); //line:netp:parseuri:endconvert1 if (uri[strlen(uri)-1] == '/') //line:netp:parseuri:slashcheck strcat(filename, "home.html"); //line:netp:parseuri:appenddefault return 1; } else { /* Dynamic content */ //line:netp:parseuri:isdynamic ptr = index(uri, '?'); //line:netp:parseuri:beginextract if (ptr) { strcpy(cgiargs, ptr+1); *ptr = '\0'; } else strcpy(cgiargs, ""); //line:netp:parseuri:endextract strcpy(filename, "."); //line:netp:parseuri:beginconvert2 strcat(filename, uri); //line:netp:parseuri:endconvert2 return 0; } }
5.2.6 serve_static function
tiny provides five common types of static content: HTML files, unformatted text files, and images encoded in GIF, PNG, and JPG formats. Function sends an HTTP response whose body contains the contents of a local file. First, determine the file type by checking the suffix of the file name, and send the response line and response header to the client. Note that the header is terminated with a blank line.
/* * serve_static - copy a file back to the client */ void serve_static(int fd, char *filename, int filesize) { int srcfd; char *srcp, filetype[MAXLINE], buf[MAXBUF]; /* Send response headers to client */ get_filetype(filename, filetype); //line:netp:servestatic:getfiletype sprintf(buf, "HTTP/1.0 200 OK\r\n"); //line:netp:servestatic:beginserve Rio_writen(fd, buf, strlen(buf)); sprintf(buf, "Server: Tiny Web Server\r\n"); Rio_writen(fd, buf, strlen(buf)); sprintf(buf, "Content-length: %d\r\n", filesize); Rio_writen(fd, buf, strlen(buf)); sprintf(buf, "Content-type: %s\r\n\r\n", filetype); Rio_writen(fd, buf, strlen(buf)); //line:netp:servestatic:endserve /* Send response body to client */ srcfd = Open(filename, O_RDONLY, 0); srcp = Mmap(0, filesize, PROT_READ, MAP_PRIVATE, srcfd, 0); Close(srcfd); Rio_writen(fd, srcp, filesize); Munmap(srcp, filesize); } /* * get_filetype - derive file type from file name */ void get_filetype(char *filename, char *filetype) { if (strstr(filename, ".html")) strcpy(filetype, "text/html"); else if (strstr(filename, ".gif")) strcpy(filetype, "image/gif"); else if (strstr(filename, ".png")) strcpy(filetype, "image/png"); else if (strstr(filename, ".jpg")) strcpy(filetype, "image/jpeg"); else strcpy(filetype, "text/plain"); }
Next, the content of the requested file is copied to the connected descriptor fd to send the response body. The code here is subtle and needs to be studied carefully. Open filename (open) in read mode and get its descriptor. The Linux mmap function maps the requested file to a virtual memory space. Call mmap to map the first filesize bytes of the file srcfd to a private read-only virtual memory area starting from the address srcp. Once the file is mapped to memory, no descriptor is required, so Close the file. If the execution fails, it will lead to a potentially fatal memory leak. Then the actual file transfer to the client, and finally release the mapped virtual memory area to avoid potentially fatal memory leakage
5.2.7 serve_dynamic function
tiny provides various types of dynamic content by deriving a child process and running a CGI program in the context of the child process. serve_ The dynamic function initially sends a response line indicating success to the client, including a Server header with information. The CGI program is responsible for sending the rest of the response. Note that it does not take into account the possibility that the CGI program will encounter some errors.
/* * serve_dynamic - run a CGI program on behalf of the client */ void serve_dynamic(int fd, char *filename, char *cgiargs) { char buf[MAXLINE], *emptylist[] = { NULL } ; /* Return first part of HTTP response */ sprintf(buf, "HTTP/1.0 200 OK\r\n"); Rio_writen(fd, buf, strlen(buf)); sprintf(buf, "Server: Tiny Web Server\r\n"); Rio_writen(fd, buf, strlen(buf)); if (Fork() == 0) {/* Child */ /* Real server would set all CGI vars here */ setenv("QUERY_STRING", cgiargs, 1); /* Redirect stdout to client */ Dup2(fd, STDOUT_FILENO); /* Run CGI program */ Execve(filename, emptylist, environ); } Wait(NULL);/* Parent waits for and reaps child */ }
After sending the response, a new child process is derived. The subprocess initializes the query with the CGI parameter from the request URI_ String environment variable. Note that a real server will also set other CGI environment variables here. The subprocess then redirects its standard output to the connected file descriptor, and then loads and runs the CGI program. Because the CGI program runs in the context of a child process, it can access all open files and environment variables that existed before the execve function was called. Therefore, anything written by the CGI program to the standard output will be sent directly to the client process without any interference from the parent process. Meanwhile, the parent process is blocked in the call to wait. When the child process terminates, it will recover the resources allocated by the operating system to the child process
6. Summary
The client and server establish a connection by using a socket interface. A socket is an endpoint of a connection, which is provided to the application in the form of a file descriptor. Communication using socket interface
The Web server uses the HTTP protocol to communicate with clients, such as browsers. The browser requests static or dynamic content from the server. The request for static content is served by obtaining the file from the server disk and returning it to the client. Requests for dynamic content are served by running a program in the context of a child process on the server and returning its output to the client. CGI standard provides a set of rules to manage how the client passes program parameters to the server, how the server passes these parameters and other information to the child process, and how the child process sends its output back to the client. Finally, a simple but effective Web server is implemented, which can provide both static and dynamic content.