How to understand FIleReader and InputStreamReader

Posted by polson on Tue, 18 Jan 2022 11:30:11 +0100

Read this article before reading this article
The above article has roughly explained the essential difference between the two
Both can read files, so how is the performance different from others?
Character set selection
FileReader uses the system default character set, while InputStreamReader can define its own character set
The situation is as follows:

 FileReader fr=new FileReader(new File("cmp.txt"));
InputStreamReader isr=new InputStreamReader(new FileInputStream("cmp.txt"),"gbk");

Performance aspect
The above article has explained that FileReader is actually a simple derivative of InputStreamReader and does not extend any functions. Therefore, I ran a one million word txt document and found that their copying time is the same after using the same code structure and the same buffer
FileReader

package IOStream_ch13.InStrRderCmpFileRder;

import java.io.*;

/**
 * @author: Serendipity
 * Date: 2022/1/17 14:13
 * Description:
 */
public class FileReader_use {
    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        BufferedReader br=null;
        BufferedWriter bw=null;

        try {
            FileReader fr=new FileReader(new File("cmp.txt"));
            FileWriter fw=new FileWriter(new File("cmp2.txt"));
            br=new BufferedReader(fr);
            bw=new BufferedWriter(fw);
            int len=0;
            char[]cbuf=new char[1024];
            while((len=br.read(cbuf))!=-1)
            {
                bw.write(cbuf,0,len);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if(bw!=null)
            {
                try {
                    bw.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            if(br!=null)
            {
                try {
                    br.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

        long end = System.currentTimeMillis();
        System.out.println(end-start);
    }
}

InputStreamReader

package IOStream_ch13.InStrRderCmpFileRder;

import java.io.*;

/**
 * @author: Serendipity
 * Date: 2022/1/17 13:49
 * Description:
 * This test code is a comparison of why InputStreamReader is used instead of FileReader directly
 * 1:performance
 * The two are almost the same. It takes about the same time to face a million word long text
 * 2:Character set selection
 * The latter can only use the system default character set, so there may be garbled code. The former can specify how to write the character set, so it can be written manually
 * Select character set to avoid garbled code caused by character set
 */
public class test {
    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        BufferedReader br=null;
        BufferedWriter bw=null;
        try {
            InputStreamReader isr=new InputStreamReader(new FileInputStream("cmp.txt"),"gbk");
            br=new BufferedReader(isr);
            OutputStreamWriter osw=new OutputStreamWriter(new FileOutputStream("cmp1.txt"),"gbk");
            bw=new BufferedWriter(osw);
            int len=0;
            char[]cbuf=new char[1024];
            while((len=br.read(cbuf))!=-1)
            {
                bw.write(cbuf,0,len);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (bw!=null)
            {
                try {
                    bw.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            if(br!=null)
            {
                try {
                    br.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

        long end = System.currentTimeMillis();
        System.out.println(end-start);
    }
}

As for why a BufferReader is used to read, the general reasons are as follows:
There is a line of such code in BufferReader

private char cb[];
private static int defaultCharBufferSize = 8192;

This code means that once you create and use a BufferReader, it will maintain an array of 8192 characters until you actively flush or encounter the EOF identifier, otherwise the data in the array will be maintained
In fact, the BufferReader is built on an InputStream, that is, it essentially relies on a stream that reads bytes, such as this sentence in my code

 InputStreamReader isr=new InputStreamReader(new FileInputStream("cmp.txt"),"gbk");
            br=new BufferedReader(isr);

In fact, the file is read from the FileInputStream stream, which is a node stream, that is, a stream that directly accesses the hard disk data. Both the InputStreamReader and the BufferReader are processing streams. They do not directly access the hard disk data, but they operate on the node stream, giving the node stream functions that the node stream did not previously have. The InputStreamReader keeps the bytes originally read, After reading a certain number, these bytes are spliced into a character, and then the bytes are transmitted in the form of characters. The BufferReader keeps these characters in the buffer until the buffer is filled or actively flush ed, and then sends the data in the buffer to the destination at one time
Therefore, BufferReader is actually used to improve efficiency. Why?

Using FileInputStream to read and write a piece of data is a byte by byte reading. If there is a file with a size of 10 bytes, it is necessary to call 10 system calls, assign the read data to the variable each time, and then the program uses the variable. The buffer can be regarded as an array in memory, but the system call is still used to read data from the hard disk. The system call is still read one at a time, but after each call, the obtained data is put into the buffer, and then the program uses 10 data at a time. Of course, writing these 10 data to the disk also requires 10 IO operations

But with or without buffers, the number of system calls used is the same. When I\O operations are called, they are actually read or written one by one. The key is that there is only one CPU, no matter how many cores. Will the CPU participate in the main operations during system calls? More participation will take more time. During system call, if buffer is not used, CPU will consider using interrupt as appropriate. At this time, the CPU is active and spends part of each cycle asking whether the I\O device has read the data. During this time, the CPU cannot do anything else (at least the core responsible for executing this module cannot). Therefore, the call reads one word at a time, notifies once, and the CPU frees up time for processing once. When buffering is set, the CPU usually uses DMA to perform I\O operations. The CPU gives this work to the DMA controller to do and make time for other things. When the DMA completes the work, the DMA will actively tell the CPU that the operation is completed. At this time, the CPU takes over the follow-up work. Here the CPU is passive. DMA is dedicated to I ＼ O and memory data exchange. It not only has high efficiency, but also saves CPU time. CPU makes some settings at the beginning and end of DMA. So, call once without notifying the CPU. When the buffer is full, DMA will say to the CPU, "Hey, man! Come and have a look and move them away". To sum up, setting the buffer creates a data block, which makes DMA execution more convenient and the CPU is free, rather than waiting for I\O data to be read. From a microscopic point of view, setting buffer efficiency is much higher. Although, it can't be seen from this program. You can see the gap by reading and writing tens of thousands of words.

Topics: Java

Programmer Think

How to understand FIleReader and InputStreamReader

Hot Topics