[C language] detailed explanation of file operation

Posted by George W. Bush on Tue, 01 Mar 2022 06:49:14 +0100


I haven't updated the blog of C language learning for a long time. What I bring today is the knowledge points of the file part! 😋

1. Why do I need documents?

Previously learned the code implementation of the address book, which can add and delete contacts in the address book. However, the address book is destroyed when your exe file is closed. Its contents cannot be postponed to the next time you open the address book, which is inconvenient for our use.

The file can help us realize the persistence of data: save the data in the disk file, and the previously saved contacts will not disappear the next time we open the address book.

2. What is a document?

A file is data with a specific format stored on disk.

2.1 document classification

In programming, two kinds of files are generally discussed: program file and data file

  • Program files: code source files, such as c. Object file obj/.o. Executable file exe
  • Data file: the data read and written by the program during use, such as the file of reading content and the file of data output

What we know about this blog is data files

2.2 file name

The file name consists of three parts: file path + file name trunk + file suffix

For example: C: \ code \ test txt

The file ID is often referred to as the file name

3. Use of documents

3.1 document pointer

In file operation, a very important knowledge point is file type pointer, which is called file pointer for short

Each FILE has a FILE information area when opening up, which is used to save the name, status, current location and other relevant information of the FILE. This information is stored in a structure, which is declared as FILE by the system

Different C language compilers have different FILE types, but they are similar.

When opening a FILE, the system will automatically create a FILE structure variable and fill its information according to the content of the FILE.

When we need to use a FILE, we can access the structure variable through a pointer of FILE type

3.2 opening and closing files

The file needs to be opened before reading and writing, and closed after use

This is similar to dynamic memory management

ANSIC stipulates that fopen function is used to open the file and fclose is used to close the file.

When opening a FILE, a pointer variable of FILE * will be returned to point to the FILE.

After closing the file, the file pointer becomes a wild pointer, which needs to be set to NULL to prevent wrong calls

fopen function fails to open the file and returns a null pointer

#include <stdio.h>
#include <errno.h>
#include <string.h>
int main()
{
    //Open file
	FILE* pf = fopen("test.txt", "r");
	if (pf == NULL)
	{
		printf("%s\n", strerror(errno));//Use this function to print error messages
		return 0;
	}
	//1. Reading documents
    
	//Close file
	fclose(pf);
	pf = NULL;
	return 0;
}
#include <stdio.h>
#include <errno.h>
#include <string.h>
int main()
{
	//Open file
	FILE* pf = fopen("test.txt", "w");
	if (pf == NULL)
	{
		printf("%s\n", strerror(errno));//Use this function to print error messages
		return 0;
	}
	//2. Write documents
    
    //Close file
	fclose(pf);
	pf = NULL;

	return 0;
}

strerror function is explained in this blog 👉 Point me

3.2.1 document usage

Through this table, we can understand the different types of file usage

  • Note: they are used in double quotation marks, not single quotation marks!

When writing with w, the existing content will be overwritten. If you need to add after the existing content, you need to use a

3.2.2 standard input / output stream

  • Output: memory → file
  • Input: file → memory

C language program will open three streams by default when running

  • stdin: standard input stream
  • stdout: standard output stream
  • stderr: standard error stream

When performing input and output operations, we used to print the data printf in memory directly to the screen

Now we can input the data into the standard output stream through the file pointer to achieve the effect similar to printf

3.3 file input / output function

In the above code, the fputc function is used to input a character into a file

The following table lists some file functions we will use

3.3.1 character input and output

fputc function: write a single character to a file

fgetc function: reads a single character from a file

As you can see, we have printed out all the characters just written in the file

Realize file copy

Copy the contents of one file to another

int main()
{
	//Implement a code to convert data Txt copy to generate data2 txt
	FILE* pr = fopen("data.txt", "r");
	if (pr == NULL)
	{
		printf("open for reading: %s\n", strerror(errno));
		return 0;
	}

	FILE* pw = fopen("data2.txt", "w");
	if (pw == NULL)
	{
		printf("open for writting: %s\n", strerror(errno));
		fclose(pr);
		pr = NULL;
		return 0;
	}
	//Copy file
	int ch = 0;
	while ((ch = fgetc(pr)) != EOF)
	{
		fputc(ch, pw);
	}

	fclose(pr);
	pr = NULL;
	fclose(pw);
	pw = NULL;

	return 0;
}

3.3.2 text line input and output

fputs function: writes a string to a file

//Write a line
#include <stdio.h>
int main()
{
	FILE* pf = fopen("data.txt", "w");
	if (pf == NULL)
	{
		printf("%s\n", strerror(errno));
		return 0;
	}
	fputs("hello world\n", pf);
	fputs("hehe\n", pf);


	fclose(pf);
	pf = NULL;

	return 0;
}

Run the code and you can see that the two lines of string have been written to the data under the project path Txt file

fgets function: reads a string of specified length from a file

This function has the third parameter when used, which is used to limit the length of the read string

read file-Read a line
int main()
{
	FILE* pf = fopen("data.txt", "r");
	if (pf == NULL)
	{
		printf("%s\n", strerror(errno));
		return 0;
	}
	char buf[1000] = {0};
	//read file
	fgets(buf, 3, pf);
	printf("%s\n", buf);

	fgets(buf, 3, pf);
	printf("%s\n", buf);

	fclose(pf);
	pf = NULL;

	return 0;
}

Running the program, you can see that we set 3, but only read 2 characters

Change buf[2] to 1, debug and view

You can see that after the first fgets function is executed, the original 1 is written to \ 0

This proves that the fgets function will end with \ 0 when reading characters

If we need to read 3 characters, we need to set the limit to 4

3.3.3 format input and output

The "format" here refers to the data content with specific format such as structure

fprintf function: writes formatted data to a file

#include<stdio.h>
//......
struct Stu
{
	char name[20];
	int age;
	double d;
};
int main()
{
	struct Stu s = { "Zhang San", 20, 95.5 };
	FILE* pf = fopen("data.txt", "w");
	if (pf == NULL)
	{
		printf("%s\n", strerror(errno));
		return 0;
	}
	//Write formatted data
	fprintf(pf, "%s %d %lf", s.name, s.age, s.d);

	fclose(pf);
	pf = NULL;

	return 0;
}

fscanf function: read the formatted data from the file and store it in the corresponding structure variable s

3.3.4 binary input and output

  • fread and fwrite can operate on any type of data
  • As its name suggests, binary input function is to input content into a file in binary form

When using this function, you need to use * * "rb", "wb" * * to open the file

fwrite(s, sizeof(struct Stu), 2, pf);
//s source
//sizeof the size of the element to be written
//2 number of elements to be written
//pf write target file pointer

The following is an example of writing structure variables

struct Stu
{
	char name[20];
	int age;
	double d;
};
//Binary write
int main()
{
	struct Stu s[2] = { {"Zhang San", 20, 95.5} , {"lisi", 16, 66.5}};

	FILE* pf = fopen("data.txt", "wb");
	if (pf == NULL)
	{
		printf("%s\n", strerror(errno));
		return 0;
	}
	//Write files in binary mode
	fwrite(s, sizeof(struct Stu), 2, pf);

	fclose(pf);
	pf = NULL;

	return 0;
}

It can be seen that the data written at this time has partially become garbled. At this time, its contents are already stored in binary, and the txt reader cannot read these data correctly

Binary reading is the step of reproduction, which reads out the binary data in the text in a specific format and puts it into the corresponding variable

fread(s, sizeof(struct Stu), 2, pf);
//s variable for storing file contents
//sizeof needs to read the size of the element
//2 number of elements to be read
//pf read target file pointer

3.3.5 sscanf/sprintf function

These two functions are special. Their function is to copy the formatted data (such as structure) in the file into the character array in the form of string

See the figure below

3.4. Other file functions

3.4.1 fseek

http://cplusplus.com/reference/cstdio/fseek/?kw=fseek

This function moves the file pointer to a specific offset relative to a location

It sounds a little tongue twister. Just give an example

Give a string "abcdef"

Each time fgetc is used, the file pointer will go back one bit. Used twice, the file pointer points to the character c

If we need to point to f, let the pointer

  • 5 positions backward from the starting position
  • 3 bits backward from the current position
  • 1 bit forward from end position

We can use this function to locate the file pointer, change it to the position we need, and perform character replacement and other operations

int main()
{
    FILE* pf = fopen("test.txt", "w");
    if (pf == NULL)
    {
        printf("%s\n", strerror(errno));
        return 0;
    }
    //Write file
    int ch = 0;
    for (ch = 'a'; ch <= 'z'; ch++)
    {
        fputc(ch, pf);
    }

    //Locate file pointer
    
    fseek(pf, -2, SEEK_END);
    fputc('#', pf);//Replace the current character with#

    fclose(pf);
    pf = NULL;
    return 0;
}

3.4.2 ftell

Returns the current offset of the file pointer (relative to the beginning of the file)

3.4.3 rewind

http://cplusplus.com/reference/cstdio/rewind/?kw=rewind

Return the position of the file pointer to the starting position of the file

fseek(pf, 0, SEEK_SET);
//The rewind function is equivalent to the fseek function
//But rewind is more convenient
int main()
{
    FILE* pf = fopen("test.txt", "r");
    if (pf == NULL)
    {
        printf("%s\n", strerror(errno));
        return 0;
    }
    //read file
    int ch = fgetc(pf);
    printf("%c\n", ch);//a
    ch = fgetc(pf);
    printf("%c\n", ch);//b
    
    int ret = ftell(pf);
    printf("%d\n", ret);//2
    rewind(pf);
    //fseek(pf, 0, SEEK_SET);
    ret = ftell(pf);
    printf("%d\n", ret);//0
    fclose(pf);
    pf = NULL;
    return 0;
}

4. Text files and binary files

We now know that the fread/fwrite function can realize binary input and output. How do they implement them?

According to the organization form of data, data files are called text files or binary files. Data is stored in binary form in memory. If it is output to external memory without conversion, it is a binary file.

If it is required to store in the form of ASCII code on external memory, it needs to be converted before storage. The file stored in the form of ASCII characters is a text file.

In memory, all characters are stored in ASCII form, and numerical data can be stored in ASCII form or binary form.

The number 10000 can be stored in the following two ways

  • 1 0 is stored as 5 characters – 5 bytes
  • Store in binary form of the number itself – 4 bytes

At this time, using binary mode can save space

Use the following code to write 10000 to the file in binary mode

In VS, we can open test with a binary editor in a specific way Txt document

You can see that 10000 is stored in the file in the form of binary code

This involves the problem of large and small ends 👉 Point me

5. Determination of the end of file reading

5.1 misusing feof

You cannot directly use the return value of the feof function to determine whether the file is ended

Instead, you should use the feof function to judge whether the reading fails or ends normally at the end of the file

  1. Whether the reading of the text file is finished, and judge the return value
    • EOF(fgetc)
    • NULL(fgets)

ferror function: judge whether there is a reading error in the file. If yes, return to true

http://cplusplus.com/reference/cstdio/ferror/?kw=ferror

  1. After reading the binary file, judge whether the return value is less than the actual number to be read
    • The return value of fread is the number of successfully read data
    • Judge whether the return value is less than the actual number to be read

6. File buffer

ANSIC standard adopts "buffer file system" to process data files.

The so-called buffer file system means that the system automatically opens up a "file buffer" for each file being used in the program in memory. Data output from memory to disk will be sent to the buffer in memory first, and then sent to disk together after the buffer is filled.

Like git, this is to put the files that need to be pushed into the cache first, and then push them to the remote warehouse after confirming that the files are correct

If you read data from the disk to the computer, read the data from the disk file, input it into the memory buffer (fill the buffer), and then send the data from the buffer to the program data area (program variables, etc.) one by one. The size of the buffer is determined by the C compilation system (compiler).

Because of the existence of buffer, C language needs to refresh the buffer or close the file at the end of file operation when operating the file. If not, it may cause problems in reading and writing files.

Code example 1

#include <stdio.h>
#include <windows.h>

int main()
{
	FILE* pf = fopen("test.txt", "w");
	fputs("abcdef", pf);//Put the code in the output buffer first
	printf("Sleep for 10 seconds-The data has been written. Open it test.txt File, no content found in the file\n");
	Sleep(10000);
	printf("refresh buffer \n");
	fflush(pf);//When the buffer is refreshed, the data of the output buffer is written to the file (disk)
	
	printf("Sleep for another 10 seconds-At this point, open again test.txt File, the file has content\n");
	Sleep(10000);
	fclose(pf);
	//Note: fclose also flushes the buffer when closing the file
	pf = NULL;
	return 0;
}

Run the program and pause the program through the sleep function. You can see that the initial string is not saved in the file

Instead, the input buffer is written first, and the txt file is written only after the buffer is refreshed

Code example 2

#include <stdio.h>
#include <windows.h>
int main()
{
	while (1)
	{
		printf("hehe\n");
		//In the linux environment, without '\ n', it will not print (without refreshing the cache)
		//In VS environment, it will print normally with or without
		Sleep(1000);//In linux environment, the parameter of sleep function is in seconds (VS is in milliseconds)
        //             In linux environment, the sleep function needs to be lowercase, and in VS, it is sleep
	}
	return 0;
}

Test this code in a Linux environment (raspberry pie)

As you can see, after \ n is removed, the code will not print hehe

When compiling, an error is reported 👇, But the program is still compiled

implicit declaration of function 'sleep'

CSDN checked and found that it needs to reference the header file #include < unistd h>

Recompile and no error is reported (here hehe has been added, and the program prints normally)

epilogue

The contents of the document chapters are very rich. Have you paid the tuition! 😁

Most of the content still needs us to operate a lot to get familiar with its real function

If the content is wrong, please correct it ruthlessly!

Topics: C Back-end