Bufferless file IO and directory operations

Posted by fenway on Sat, 22 Jan 2022 15:20:04 +0100


In the background development, for file I/O, we usually do not use fopen, fread and fwrite standard I/O encapsulated in C language, but directly use the system call function provided by Linux. Because these system calls do not use user buffers, we deal directly with the kernel, which is more efficient, and we can customize some operations that meet the application scenarios. The following describes the data structure of Linux for file I/O and some specific system call functions.

File descriptor

All open files are referenced through file descriptors, which are only valid in the current process, because each process has a PCB structure, and the PCB contains a file descriptor table.

File descriptor 0 corresponds to standard input, 1 corresponds to standard output, and 2 corresponds to standard error. These are the file descriptors bound by default when the process is created.

The above corresponds to unistd Stdin in H_ FILENO,STDOUT_FILENO,STDERR_FILENO.

The maximum file descriptor of each process can be viewed by ulimit -a command and set numerically by ulimit -n.

Unbuffered file IO

The unbuffered IO here refers to the unbuffered user buffer, such as fopen, fwirte, fread and other functions. The process will maintain the buffer in the user process space, then the kernel has the buffer, and finally the disk. Unbuffered IO means that there are only kernel buffers and no user buffers, not no buffers.

Kernel data structure for file I/O

Each process has a record item in the process table, which contains an open file descriptor table.

In the open file descriptor table, each descriptor occupies one item:

  • File descriptor flag: currently only CLOEXEC.
  • Pointer to file table entry

The kernel maintains a file table for all open files. Each file table item includes:

  • File status flag: including file type and access rights.
  • Current file offset
  • Pointer to the inode node of the file

Their relationship is as follows:

Different file table entries can point to the same file (I node, i.e. index node), which allows different processes to have their own offset and open access rights to the file.

Different file descriptors can point to the same file table entry. For example, after fork, each same file descriptor of the parent-child process points to the same file table entry.

[PS]: pay attention to the difference between file descriptor flags and file status flags.

open function

#include <fcntl.h>

int open(char* path, int flag, .../* mode_t mode */);

Return value:
    File descriptor returned successfully
    Failure Return-1

The flag parameter can be:

  • Required parameter: O_RDONLY (read only), O_WRONLY, O_RDWR (read write), O_EXEC (execute only), O_SEARCH (search only, applied to catalog files). These five parameters are mutually exclusive, that is, only one can be specified.
  • Optional parameters, bitwise OR: O_APPEND (append to the end of the file every time you write), O_CLOEXEC (set FD_CLOEXEC constant to file descriptor flag), O_ Create (if the file does not exist, create it. When using this parameter, you need to specify the third parameter, that is, the permission bit mode of the file, such as 0644, if it exists, open it directly), O_DIRECTORY (if path is not a directory, an error is returned), O_EXCL (specified with O_CREAT, if the file already exists, an error will occur. The file can be tested and created atomically.) O_NONBLOCK, O_SYNC (wait for physical I/O to complete each write), O_TRUNC (truncate the length of the file to 0 if it exists and is opened for write only or read-write).

Create function

#include <fcntl.h>

int creat(char* path, mode_t mode);

Return value:
    Success returns to write only the open file descriptor
    Return if failed-1

One disadvantage of the create function is that it opens the newly created file in write only mode.

close function

#include <fcntl.h>

int close(int fd);

Return value:
    0 returned successfully
    Failure Return-1

When a process terminates, the kernel automatically closes all files it opens.

lseek function

Each open file has a current file offset (in the global open file table entry of the system). Both read and write operations start from the current offset and increase the offset by the number of bytes read. By default, if O is not specified when opening a file_ Append option, the offset is set to 0.

#include <fcntl.h>

off_t lseek(int fd, off_t offset, int whence);

Return value:
    Successfully returned the new file offset
    Failure Return-1

Where can take the following values:

  • SEEK_SET: set the file offset to offset bytes from the beginning of the file, and offset is positive.
  • SEEK_CUR: set the file offset to the offset byte from the current position, and the offset can be positive or negative.
  • SEEK_END: set the file offset to offset bytes from the end of the file. Offset can be positive or negative.

The file offset can be greater than the current length of the file. The next write to the file will lengthen the file and form a hole in the file. Bytes that are in the file but have not been written are read as 0

read function

#include <unistd.h>

ssize_t read(int fd, void* buf, size_t nbytes);

Return value:
    The number of bytes read is returned successfully. If the end of the file is read, 0 is returned
    Failure Return-1

The read operation starts from the current offset of the file. After successful reading, the offset increases the number of bytes read.

write function

#include <unistd.h>

ssize_t write(int fd, void* buf, size_t nbytes);

Return value:
    Successfully returned the number of bytes written
    Failure Return-1

The return value of write is usually the same as the value of the parameter nbytes. Otherwise, it indicates an error.

Blocking and non blocking of read and write

Read and write are not blocked for regular files and must be returned in a limited time. The read terminal or network device may be blocked. The terminal will refresh the input buffer only when the user input line breaks, and the network device is not sure when there is input. The write network device may be blocked. You can specify o when you open the file_ If Nonblock is set to non blocking, cyclic reading is required during read.

DUP and dup2 functions

#include <unistd.h>

int dup(int fd);
int dup2(int fd, int fd2);

Return value:
    File descriptor returned successfully
    Failure Return-1

dup returns the smallest unused file descriptor so that the returned file descriptor points to the same file table entry as fd.

dup2 first closes fd2, then makes fd2 point to the file table entry pointed to by FD, and returns fd2.

The close on exec flag of a new descriptor is always cleared by the dup function.

sync and fsync functions

#include <unistd.h>

void sync(void);
int fsync(int fd);

When writing data to a file, the kernel first copies the data to the kernel buffer, then queues it, and writes it to the disk later, which is called delayed write.

sync queues all modified block buffers and returns immediately without waiting for the end of the actual write to disk operation.

fsync works only on the file specified by the file descriptor fd and returns only after the write to disk operation is completed, not immediately.

stat function

The stat function is used to view the attribute information of the file.

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int stat(char* file_path, struct stat* st);
int lstat(char* file_path, struct stat* st);  // Compared with stat, for symbolic links, lstat obtains the attribute information of the symbolic link itself

    file_path Indicates the file path, st Indicates the returned stat Structure, which is an outgoing parameter.
Return value:
    0 is returned for success and 0 is returned for failure-1. 

stat It is defined as follows:
struct stat {
    dev_t     st_dev;         /* ID of device containing file */
    ino_t     st_ino;         /* Inode number */
    mode_t    st_mode;        /* File type and mode */
    nlink_t   st_nlink;       /* Number of hard links */
    uid_t     st_uid;         /* User ID of owner */
    gid_t     st_gid;         /* Group ID of owner */
    dev_t     st_rdev;        /* Device ID (if special file) */
    off_t     st_size;        /* Total size, in bytes */
    blksize_t st_blksize;     /* Block size for filesystem I/O */
    blkcnt_t  st_blocks;      /* Number of 512B blocks allocated */

    /* Since Linux 2.6, the kernel supports nanosecond
                  precision for the following timestamp fields.
                  For the details before Linux 2.6, see NOTES. */

    struct timespec st_atim;  /* Time of last access */
    struct timespec st_mtim;  /* Time of last modification */
    struct timespec st_ctim;  /* Time of last status change */

    #define st_atime st_atim.tv_sec      /* Backward compatibility */
    #define st_mtime st_mtim.tv_sec
    #define st_ctime st_ctim.tv_sec

Get file type:

Methods ①: bitwise AND.

switch(st.st_mode & S_IFMT){
    case S_IFBLK:  printf("block device\n");            break;
    case S_IFCHR:  printf("character device\n");        break;
    case S_IFDIR:  printf("directory\n");               break;
    case S_IFIFO:  printf("FIFO/pipe\n");               break;
    case S_IFLNK:  printf("symlink\n");                 break;
    case S_IFREG:  printf("regular file\n");            break;
    case S_IFSOCK: printf("socket\n");                  break;
    default:       printf("unknown?\n");                break;


Method ②: use macro.

}else if(S_ISBLK(st.st_mode)){

Obtain file permissions: bitwise AND.

if(st.st_mode & S_IRUSR){ // Does the file owner have read permission
if(st.st_mode & S_IWGRP){ // Does the file owner group have write permission
if(st.st_mode & S_IXOTH){ // Do other users have permission to execute files

fcntl function

#include <fcntl.h>

int fcntl(int fd, int cmd, .../* int arg */);

Return value:
    The successful return value depends on cmd
    Failure Return-1

cmd can take the following values:

  • F_DUPFD,F_DUPFD_CLOEXEC copy descriptor
  • F_GETFD,F_SETFD sets the get file descriptor flag. At present, only O can be changed_ CLOEXEC.
  • F_GETFL,F_SETFL setting gets the file status flag. Only some attributes can be changed: O_APPEND,O_ASYNC,O_NONBLOCK,O_DIRECT,O_NOATIME. PS: to change the file descriptor flag and file status flag, you need to obtain the original, then press bit or (set), bit and non (cancel setting) on the original basis, and finally set.
  • F_GETOWN,F_SETOWN settings get asynchronous IO ownership
  • F_GETLK,F_SETLK,F_SETLKW set get record lock

getcwd function

getcwd is used to get the current working directory of the process.

#include <unistd.h>

char* getcwd(char* buf, size);

    buf Is the pre allocated memory area, size by buf The size of the.
Return value:
    Save the current working directory of the process to buf Yes.
    Failure Return NULL. 

chdir function

chdir is used to change the current working directory of the process.

#include <unistd.h>

int chdir(const char* path);

    path Is the target directory, which can be absolute or relative path.
Return value:
    0 returned successfully.
    Failure Return-1. 

opendir function

opendir is used to open a directory file.

#include <sys/types.h>
#include <dirent.h>

DIR* opendir(const char* path);

    path Is the directory path
 Return value:
    Failure Return NULL. 
    Successful return DIR Structure pointer.

readdir function

readdir is used to read the directory entries in the directory file.

#include <dirent.h>

struct dirent* readdir(DIR* dir);

Return value:
    Every read DIR An entry in the directory pointed to. If you need to traverse all entries, you need to call the loop readdir. readdir Also read.and..That is, the current directory and the parent directory.

Return value:
    Failure Return NULL,And set error. 
    If all entries have been read, it also returns NULL,But not set error. 
dirent It is defined as follows:
struct dirent {
    ino_t          d_ino;       /* Inode number */
    off_t          d_off;       /* Not an offset; see below */
    unsigned short d_reclen;    /* Length of this record */
    unsigned char  d_type;      /* Type of file; not supported
                                              by all filesystem types */
    char           d_name[256]; /* Null-terminated filename */

d_ino Number the index node corresponding to the file.

d_type The values are as follows, which can be used to judge the file type:

    DT_BLK      This is a block device.

    DT_CHR      This is a character device.

    DT_DIR      This is a directory.

    DT_FIFO     This is a named pipe (FIFO).

    DT_LNK      This is a symbolic link.

    DT_REG      This is a regular file.

    DT_SOCK     This is a UNIX domain socket.

    DT_UNKNOWN  The file type could not be determined.
d_name Is the file name to'\0'At the end, the maximum length is 256.

closedir function

closedir is used to close open catalog files.

#include <sys/types.h>
#include <dirent.h>

int closedir(DIR* dir);

    dir Is a directory structure
 Return value:
    Failure Return-1. 

reference material

  • <APUE>
  • Linux kernel design and Implementation

Topics: Linux APUE