The number of characters, words and total rows in the C language statistics file

Posted by rbastien on Mon, 03 Jan 2022 07:54:50 +0100

Count the number of characters, words and total rows of the file, including:

  • Number of characters and words per line
  • The total number of characters, words, and total rows of the file


be careful:

  • White space characters (spaces and tab indents) are not included in the total number of characters;
  • Words are separated by spaces;
  • Do not consider a word in two lines;
  • Limit the number of characters per line to 1000.


Please look at the code first:

#include <stdio.h>
#include <string.h>
int *getCharNum(char *filename, int *totalNum);
int main(){
    char filename[30];
    // totalNum[0]: total rows totalNum[1]: total characters totalNum[2]: total single words
    int totalNum[3] = {0, 0, 0};
    printf("Input file name: ");
    scanf("%s", filename);
    if(getCharNum(filename, totalNum)){
        printf("Total: %d lines, %d words, %d chars\n", totalNum[0], totalNum[2], totalNum[1]);
    }else{
        printf("Error!\n");
    }
    return 0;
}
/**
 * Count the number of characters, words and lines of the file
 *
 * @param  filename  file name
 * @param  totalNum  File statistics
 *
 * @return  Statistics are returned successfully, otherwise NULL is returned
**/
int *getCharNum(char *filename, int *totalNum){
    FILE *fp;  // Pointer to file
    char buffer[1003];  //A buffer that stores the contents of each row read
    int bufferLen;  // The length of what is actually stored in the buffer
    int i;  // The i th character of the current read buffer
    char c;  // Read characters
    int isLastBlank = 0;  // Is the last character a space
    int charNum = 0;  // The number of characters in the current line
    int wordNum = 0; // Number of words in the current line
    if( (fp=fopen(filename, "rb")) == NULL ){
        perror(filename);
        return NULL;
    }
    printf("line   words  chars\n");
    // Read one line of data at a time and save it to the buffer. Each line can only have 1000 characters at most
    while(fgets(buffer, 1003, fp) != NULL){
        bufferLen = strlen(buffer);
        // Traverse the contents of the buffer
        for(i=0; i<bufferLen; i++){
            c = buffer[i];
            if( c==' ' || c=='\t'){  // Space encountered
                !isLastBlank && wordNum++;  // If the last character is not a space, add 1 to the number of words
                isLastBlank = 1;
            }else if(c!='\n'&&c!='\r'){  // Ignore line breaks
                charNum++;  // If it is neither a newline character nor a space, add 1 to the number of characters
                isLastBlank = 0;
            }
        }
        !isLastBlank && wordNum++;  // If the last character is not a space, add 1 to the number of words
        isLastBlank = 1;  // Reset to 1 per newline
        // At the end of one line, calculate the total number of characters, the total number of single words and the total number of rows
        totalNum[0]++;  // Total number of rows
        totalNum[1] += charNum;  // Total characters
        totalNum[2] += wordNum;  // Total number of single words
        printf("%-7d%-7d%d\n", totalNum[0], wordNum, charNum);
        // Set to zero and count the next line again
        charNum = 0;
        wordNum = 0;
    }
    return totalNum;
}

Create a file demo on disk D Txt and enter the following:

I am Chinese. I love my country.
China has 960 square kilometers of territory.
China has a population of 1.35 billion.
The capital of China is Beijing.

                                By gunge

                                2021-08-12

Run the program, and the output result is:

Input file name: d://demo.txt
line   words  chars
1      7      26
2      7      39
3      7      33
4      6      27
5      0      0
6      2      7
7      0      0
8      1      10
Total: 8 lines, 30 words, 142 chars

 

The above program reads one line from the file at a time, puts it in the buffer, and then traverses the buffer to count the number of characters and words in the current line.

The fgets() function is used to read a line or a specified number of characters from the file. Its prototype is:
   char * fgets(char *buffer, int size, FILE * stream);
Parameter Description:

  • Buffer is a buffer used to store the read data.
  • Size is the number of characters to read. If the number of characters in this line is greater than size-1, it ends when size-1 characters are read, and '\ 0' is added at the end; If the number of characters in this line is less than or equal to size-1, read all characters and supplement '\ 0' at the end. That is, a maximum of size-1 characters can be read at a time. Characters read include line breaks.
  • stream is a file pointer.


Some readers ask why you don't use getc() to read one character from the file at a time without opening up a buffer.

This is no problem, but pay attention to cross platform issues when handling line breaks, because different platforms handle line breaks of text files differently. Linux uses' \ n 'as line breaks, Windows uses' \ n\r' as line breaks, and Mac uses' \ r\n 'as line breaks. Therefore, it is troublesome to use the getc() function to handle line breaks.

To simplify, read the whole line of data through fgets(), and then process each character, ignoring '\ n' and '\ r'.

Note: since there will be a newline character with a maximum length of 2 bytes at the end of each line, and fgets() will also add NUL, the length of the buffer must be at least 1003 to accommodate 1000 characters per line, otherwise strlen() may return a garbage value.

Look at line 43 of the code. When there is an error opening the file, NULL is returned instead of the stiff exit(). In this way, you can notify the main function of an error, let the main function handle it properly, or notify the user to improve the user experience of the software.  

Topics: C