Java Engineer's road to God IO

Posted by annette on Sat, 19 Feb 2022 03:49:54 +0100

Original author: Hollis

Character stream, byte stream

Bytes and characters

Bit is the smallest binary unit and is the operating part of the computer. Value 0 or 1

Byte (byte) is the smallest unit of computer operation data, which is composed of 8 bit s. The value (- 128-127)

Char (character) is the smallest readable and writable unit of the user. In Java, it is composed of 16 bit values (0-65535)

Byte stream

Operate byte type data. The main operation classes are subclasses of OutputStream and InputStream; Do not use buffer, directly operate on the file itself.

Character stream

Operation character type data. The main operation classes are subclasses of Reader and Writer; Using the buffer to buffer characters, nothing will be output without closing the stream.

Mutual conversion

The whole IO package is actually divided into byte stream and character stream, but in addition to these two streams, there is also a set of byte stream character stream conversion classes.

OutputStreamWriter: a subclass of Writer, which transforms the output character into byte stream, that is, the output object of a character stream into byte stream output object.

InputStreamReader: it is a subclass of Reader, which transforms the input bytes into character stream, that is, the input object of a byte stream into the input object of character stream.

Input stream, output stream

For input and output, there is a reference, which is the medium for storing data. If the object is read into the medium, this is input. Reading data out of the media is output.

Therefore, the input stream writes data to the storage medium. The output stream reads data from the storage medium.

Conversion between byte stream and character stream

To realize the conversion between character stream and byte stream, two classes are required:

OutputStreamWriter is a bridge between character flow and byte flow

InputStreamReader is a bridge between byte flow and character flow

Character stream into byte stream

public static void main(String[] args) throws IOException {
    File f = new File("test.txt");
    
    // OutputStreamWriter is a bridge between character flow and byte flow. It creates an object from character flow to byte flow
    OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(f),"UTF-8");
    
    osw.write("I convert character stream into byte stream for output");
    osw.close();

}

Byte stream into character stream

  public static void main(String[] args) throws IOException {
        
        File f = new File("test.txt");
        
        InputStreamReader inr = new InputStreamReader(new FileInputStream(f),"UTF-8");
        
        char[] buf = new char[1024];
        
        int len = inr.read(buf);
        System.out.println(new String(buf,0,len));
        
        inr.close();

    }

Synchronous, asynchronous

Synchronous and asynchronous describe the of the callee.

If A calls B:

In case of synchronization, B will immediately execute what to do after receiving the call from A. This call of A can get the result.

If it is asynchronous, B does not guarantee that it will do what it wants to do immediately after receiving the call from a, but it is guaranteed that it will do it. B will notify a after doing it. A cannot get the result of this call, but B will notify a after execution.

The difference between synchronous, asynchronous and blocking, non blocking

Synchronous and asynchronous describe the callee.

Blocking and non blocking describe the caller.

Synchronization is not necessarily blocking, and asynchrony is not necessarily non blocking. It doesn't have to be.

For a simple example, Lao Zhang boiled water. Lao Zhang put the kettle on the fire and waited for the water to boil. (synchronous blocking) 2 Lao Zhang put the kettle on the fire, went to the living room to watch TV, and went to the kitchen from time to time to see if the water was boiling. (synchronous non blocking) 3 Lao Zhang put the kettle on the fire and waited for the water to boil. (asynchronous blocking) 4 Lao Zhang put the kettle on the fire and went to the living room to watch TV. He didn't watch it before it rang. When it rang, he went to get the kettle. (asynchronous non blocking)

The difference between 1 and 2 is that the caller does different things before getting the return. The difference between 1 and 3 is that the callee's treatment of boiling water is different.

Blocking, non blocking

Blocking and non blocking describe the of the caller

If A calls B:

If it is blocked, A will wait for B to return the result after issuing the call.

If it is non blocking, A does not need to wait after issuing the call and can do its own thing.

The difference between synchronous, asynchronous and blocking, non blocking

Synchronous and asynchronous are used to describe the callee.

Blocking, non blocking, describes the caller.

Synchronization is not necessarily blocking, and asynchrony is not necessarily non blocking. It doesn't have to be.

For a simple example, Lao Zhang boiled water. Lao Zhang put the kettle on the fire and waited for the water to boil. (synchronous blocking) 2 Lao Zhang put the kettle on the fire, went to the living room to watch TV, and went to the kitchen from time to time to see if the water was boiling. (synchronous non blocking) 3 Lao Zhang put the kettle on the fire and waited for the water to boil. (asynchronous blocking) 4 Lao Zhang put the kettle on the fire and went to the living room to watch TV. He didn't watch it before it rang. When it rang, he went to get the kettle. (non blocking asynchronously)

The difference between 1 and 2 is that the caller does different things before getting the return. The difference between 1 and 3 is that the callee's treatment of boiling water is different.

Five IO models for Linux

Blocking IO model

The most traditional IO model is that blocking occurs in the process of reading and writing data.

When the thread is ready, the user will check whether the thread is ready. If the thread is not ready, the user will wait for the data to be sent out by the user. When the data is ready, the kernel will copy the data to the user thread and return the result to the user thread, and the user thread will release the block state.

Examples of typical blocking IO models are:

data = socket.read();

If the data is not ready, it will always block the read method.

Non blocking IO model

When the user thread initiates a read operation, it does not need to wait, but gets a result immediately. If the result is an error, it knows that the data is not ready, so it can send the read operation again. Once the data in the kernel is ready and receives the request from the user thread again, it immediately copies the data to the user thread and returns.

Therefore, in fact, in the non blocking IO model, the user thread needs to constantly ask whether the kernel data is ready, that is to say, the non blocking IO will not hand over the CPU, but will always occupy the CPU.

Typical non blocking IO models are generally as follows:

while(true){
    data = socket.read();
    if(data!= error){
        Processing data
        break;
    }
}

However, there is a very serious problem with non blocking IO. In the while loop, you need to constantly ask whether the kernel data is ready, which will lead to very high CPU utilization. Therefore, in general, you rarely use the while loop to read data.

IO multiplexing model

Multiplexed IO model is a more commonly used model at present. Java NIO is actually multiplexed io.

In the multiplex IO model, a thread will constantly poll the status of multiple sockets. The actual IO read-write operation will be called only when the socket really has a read-write event. Because in the multiplex IO model, only one thread is needed to manage multiple sockets, the system does not need to establish new processes or threads, nor maintain these threads and processes, and the IO resources will be used only when there are socket read-write events, so it greatly reduces the resource occupation.

In Java NIO, it is through selector Select () to query whether each channel has an arrival event. If there is no event, it will always be blocked there. Therefore, this method will lead to the blocking of user threads.

Some friends may say that I can use multithreading + blocking IO to achieve a similar effect, but in multithreading + blocking IO, each socket corresponds to one thread, which will cause a lot of resource occupation. Especially for long connections, the resources of threads will not be released. If there are many connections later, it will cause a performance bottleneck.

In the multiplex IO mode, multiple sockets can be managed through one thread. Only when there is a real read-write event in the socket will resources be occupied for actual read-write operations. Therefore, multiplex IO is more suitable for the situation with a large number of connections.

In addition, the reason why multiplexing IO is more efficient than non blocking IO model is that in non blocking IO, constantly asking for socket status is carried out through user threads, while in multiplexing IO, polling each socket status is carried out by the kernel, which is much more efficient than user threads.

However, it should be noted that the multiplex IO model detects whether events arrive by polling, and responds to the arriving events one by one. Therefore, for the multiplex IO model, once the event response body is large, subsequent events will be delayed and new event polling will be affected.

Signal driven IO model

In the signal driven IO model, when the user thread initiates a IO request operation, it will register a signal function to the corresponding socket, then the user thread will continue to execute. When the kernel data is ready, a signal will be sent to the user thread. After the user thread receives the signal, the IO read and write operation will be invoked in the signal function to carry out the actual IO request operation.

Asynchronous IO model

Asynchronous IO model is an ideal IO model. In the asynchronous IO model, when the user thread initiates the read operation, it can start to do other things immediately. On the other hand, from the perspective of the kernel, when it receives an asynchronous read, it will return immediately, indicating that the read request has been successfully initiated, so it will not generate any block to the user thread. Then, the kernel will wait for the data preparation to be completed, and then copy the data to the user thread. When all this is completed, the kernel will send a signal to the user thread to tell it that the read operation is completed. In other words, the user thread does not need to know how the actual whole IO operation is carried out. It only needs to initiate a request first. When receiving the success signal returned by the kernel, it indicates that the IO operation has been completed and the data can be used directly.

In other words, in the asynchronous IO model, the two phases of IO operation will not block the user thread. Both phases are automatically completed by the kernel, and then send a signal to inform the user that the thread operation has been completed. The user thread does not need to call the IO function again for specific reading and writing. This is different from the signal driven model. In the signal driven model, when the user thread receives a signal indicating that the data is ready, then the user thread needs to call the IO function for actual reading and writing operations; In the asynchronous IO model, the received signal indicates that the IO operation has been completed, and there is no need to call the IO function in the user thread to do the actual read and write operations.

Note that Asynchronous IO requires the underlying support of the operating system. In Java 7, Asynchronous IO is provided.

In fact, the first four IO models belong to synchronous IO, and only the last one is real asynchronous IO, because whether it is multiplexed IO or signal driven model, the second stage of IO operation will cause user thread blocking, that is, the process of data copying by the kernel will cause user thread blocking.

Differences among BIO, NIO and AIO, usage and principle of three IO

IO

What is IO? It refers to the interface between the computer and the external world or between a program and the rest of the computer. It is very critical for any computer system, so the main body of all I/O is actually built into the operating system. Separate programs usually let the system do most of the work for them.

In Java programming, until recently, I/O was completed by using Stream. All I/O is treated as a single byte movement, one byte at a time through an object called Stream. Streaming I/O is used for contact with the outside world. It is also used internally to convert objects to bytes and then back to objects.

BIO

Java BIO is Block I/O, which synchronizes and blocks IO.

BIO is the traditional Java The following code implementation of IO package.

NIO

What is NIO? NIO has the same function and purpose as the original I/O. the most important difference between them is the way of data packaging and transmission. The original I/O processes data in stream mode, while NIO processes data in block mode.

Stream oriented I/O systems process data byte by byte at a time. An input stream generates one byte of data, and an output stream consumes one byte of data. Creating filters for streaming data is easy. It is also relatively simple to link several filters so that each filter is responsible for only part of a single complex processing mechanism. The downside is that stream oriented I/O is usually quite slow.

A block oriented I/O system processes data in the form of blocks. Each operation generates or consumes a data block in one step. Processing data by block is much faster than processing data by (streaming) bytes. However, block oriented I/O lacks the elegance and simplicity of stream oriented I/O.

AIO

Java AIO, Async non blocking, is asynchronous non blocking IO.

Differences and connections

BIO (Blocking I/O): synchronous Blocking I/O mode. Data reading and writing must be blocked in a thread and wait for it to complete. Here, suppose a scene of boiling water in which a row of kettles are boiling water. The working mode of BIO is to ask a thread to stay in a kettle until the kettle is boiling, and then deal with the next kettle. But in fact, the thread did nothing while waiting for the kettle to boil.

NIO (New I/O): it supports both blocking and non blocking modes, but here we illustrate it with its synchronous non blocking I/O mode. So what is synchronous non blocking? If boiling water is also taken as an example, NIO's approach is to ask a thread to continuously poll the state of each kettle to see if the state of any kettle has changed, so as to carry out the next operation.

AIO (Asynchronous I/O): asynchronous non blocking I/O model. What is the difference between asynchronous non blocking and synchronous non blocking? Asynchronous non blocking does not require a thread to poll for the state changes of all IO operations. After the corresponding state changes, the system will notify the corresponding thread to handle it. Corresponding to boiling water, a switch is installed on each kettle. After the water is boiled, the kettle will automatically notify me that the water is boiling.

Applicable scenarios

BIO mode is applicable to the architecture with small and fixed number of connections. This mode has high requirements for server resources, and concurrency is limited to applications. Jdk1 4 the only choice before, but the program is intuitive, simple and easy to understand.

NIO mode is applicable to architectures with a large number of connections and short connections (light operation), such as chat server. Concurrency is limited to applications and programming is complex. Jdk1 4 start support.

AIO mode is applicable to the architecture with a large number of connections and long connections (re operation), such as album server. It fully calls the OS to participate in concurrent operation. The programming is complex, and JDK7 starts to support it.

Mode of use

Use BIO to read and write files.

       //Initializes The Object
        User1 user = new User1();
        user.setName("hollis");
        user.setAge(23);
        System.out.println(user);

        //Write Obj to File
        ObjectOutputStream oos = null;
        try {
            oos = new ObjectOutputStream(new FileOutputStream("tempFile"));
            oos.writeObject(user);
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            IOUtils.closeQuietly(oos);
        }

        //Read Obj from File
        File file = new File("tempFile");
        ObjectInputStream ois = null;
        try {
            ois = new ObjectInputStream(new FileInputStream(file));
            User1 newUser = (User1) ois.readObject();
            System.out.println(newUser);
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } finally {
            IOUtils.closeQuietly(ois);
            try {
                FileUtils.forceDelete(file);
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

NIO is used to read and write files.

static void readNIO() {
        String pathname = "C:\\Users\\adew\\Desktop\\jd-gui.cfg";
        FileInputStream fin = null;
        try {
            fin = new FileInputStream(new File(pathname));
            FileChannel channel = fin.getChannel();

            int capacity = 100;// byte
            ByteBuffer bf = ByteBuffer.allocate(capacity);
            System.out.println("The limitations are:" + bf.limit() + "The capacity is:" + bf.capacity()
                    + "The location is:" + bf.position());
            int length = -1;

            while ((length = channel.read(bf)) != -1) {

                /*
                 * Note that after reading, set the position to 0 and the limit to the capacity, so that it can be read into the byte buffer next time and stored from 0
                 */
                bf.clear();
                byte[] bytes = bf.array();
                System.out.write(bytes, 0, length);
                System.out.println();

                System.out.println("The limitations are:" + bf.limit() + "The capacity is:" + bf.capacity()
                        + "The location is:" + bf.position());

            }

            channel.close();

        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (fin != null) {
                try {
                    fin.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    static void writeNIO() {
        String filename = "out.txt";
        FileOutputStream fos = null;
        try {

            fos = new FileOutputStream(new File(filename));
            FileChannel channel = fos.getChannel();
            ByteBuffer src = Charset.forName("utf8").encode("Hello, Hello, hello");
            // The capacity and limit of byte buffer will change with the data length, not fixed
            System.out.println("Initialization capacity and limit: " + src.capacity() + ","
                    + src.limit());
            int length = 0;

            while ((length = channel.write(src)) != 0) {
                /*
                 * Note that clear is not required here. After the data in the buffer is written into the channel, it is read down in the order of the second time and the last time
                 */
                System.out.println("Write length:" + length);
            }

        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (fos != null) {
                try {
                    fos.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

Using AIO to read and write files

public class ReadFromFile {
  public static void main(String[] args) throws Exception {
    Path file = Paths.get("/usr/a.txt");
    AsynchronousFileChannel channel = AsynchronousFileChannel.open(file);

    ByteBuffer buffer = ByteBuffer.allocate(100_000);
    Future<Integer> result = channel.read(buffer, 0);

    while (!result.isDone()) {
      ProfitCalculator.calculateTax();
    }
    Integer bytesRead = result.get();
    System.out.println("Bytes read [" + bytesRead + "]");
  }
}
class ProfitCalculator {
  public ProfitCalculator() {
  }
  public static void calculateTax() {
  }
}

public class WriteToFile {

  public static void main(String[] args) throws Exception {
    AsynchronousFileChannel fileChannel = AsynchronousFileChannel.open(
        Paths.get("/asynchronous.txt"), StandardOpenOption.READ,
        StandardOpenOption.WRITE, StandardOpenOption.CREATE);
    CompletionHandler<Integer, Object> handler = new CompletionHandler<Integer, Object>() {

      @Override
      public void completed(Integer result, Object attachment) {
        System.out.println("Attachment: " + attachment + " " + result
            + " bytes written");
        System.out.println("CompletionHandler Thread ID: "
            + Thread.currentThread().getId());
      }

      @Override
      public void failed(Throwable e, Object attachment) {
        System.err.println("Attachment: " + attachment + " failed with:");
        e.printStackTrace();
      }
    };

    System.out.println("Main Thread ID: " + Thread.currentThread().getId());
    fileChannel.write(ByteBuffer.wrap("Sample".getBytes()), 0, "First Write",
        handler);
    fileChannel.write(ByteBuffer.wrap("Box".getBytes()), 0, "Second Write",
        handler);

  }
}

netty

Netty is a non blocking I/O client server framework, which is mainly used to develop Java network applications, such as protocol server and client. Asynchronous event driven network application frameworks and tools are used to simplify network programming, such as TCP and UDP socket servers. Netty includes the implementation of reactor programming mode. Netty was originally developed by JBoss and is now developed and maintained by the netty project community.

In addition to being an asynchronous network application framework, netty also includes support for HTTP, HTTP2, DNS and other protocols, including the ability to run in the Servlet container, support for WebSockets, integration with Google Protocol Buffers, support for SSL/TLS, and support for SPDY protocol and message compression. Netty has been actively developed since 2004

Starting from version 4.0.0, Netty supports NIO and blocking Java sockets, as well as NIO 2 as the back end.

Essence: a Jar package made by JBoss

Objective: to quickly develop high-performance and reliable network server and client programs

Advantages: provide asynchronous, event driven network application framework and tools

reference material

If you are studying pure Java, you can apply to join me or if you are about to learn pure Java: 735057581 , you can exchange and share any questions you have. I uploaded some study manuals, development tools, PDF documents, books and tutorials that I have sorted out in recent years. You can download them yourself if necessary. Welcome to study together!

Topics: Python Java Programming Linux network