Java IO streaming learning notes

Posted by MBenefactor on Mon, 17 Jan 2022 12:50:49 +0100

Java IO streaming learning notes

preface

Start by simply learning the Java IO stream and sorting out the learning things. It is only convenient for browsing and review. Learn from Liao Xuefeng's java tutorial, and all the contents are excerpted.

brief introduction

IO refers to Input/Output, i.e. input and output. Memory centric:

Input refers to reading data from outside to memory, for example, reading files from disk to memory, reading data from network to memory, etc.
Output refers to outputting data from memory to the outside, for example, writing data from memory to files, outputting data from memory to the network, and so on.

The IO stream takes byte as the smallest unit, so it is also called byte stream.

In Java, InputStream represents input byte stream and OuputStream represents output byte stream. These are the two most basic IO streams.

If we need to read and write characters, and the characters are not all ASCII characters represented by a single byte, it is obviously more convenient to read and write according to char. This stream is called character stream.

Java provides Reader and Writer to represent character stream. The minimum data unit transmitted by character stream is char.

Reader and Writer are essentially an InputStream and OutputStream that can automatically encode and decode.

Using the Reader, although the data source is bytes, the data we read are char characters. The reason is that the Reader decodes the read bytes and converts them into char. Using InputStream, as like as two peas, we read data, which is byte[] array, but we can convert binary byte[] array to some string according to some encoding. Whether to use Reader or InputStream depends on the specific usage scenario. If the data source is not text, you can only use InputStream. If the data source is text, it is more convenient to use Reader. Writer and OutputStream are similar.

File object

In computer system, File is a very important storage method. Java standard library IO provides File objects to manipulate files and directories.

To construct a File object, you need to pass in the File path:

File f = new File("Java.iml");

The File object has three types of paths. One is getPath(), which returns the path passed in by the constructor, the other is getAbsolutePath(), which returns the absolute path, and the other is getCanonicalPath, which is similar to the absolute path, but returns the canonical path.

        System.out.println(f.getPath());
        System.out.println(f.getAbsolutePath());
        System.out.println(f.getCanonicalPath());

The standard path is to And The path converted to a standard absolute path.

Files and directories

The File object can represent either a File or a directory. In particular, when constructing a File object, even if the incoming File or directory does not exist, the code will not make an error, because constructing a File object will not lead to any disk operation. Only when we call some methods of the File object can we really perform disk operation.

Call isFile() to determine whether the File object is an existing File, and call isDirectory() to determine whether the File object is an existing directory:

        System.out.println(f.isFile());
        System.out.println(f.isDirectory());

When obtaining a File with the File object, you can further judge the permission and size of the File:

boolean canRead(): whether it is readable;
boolean canWrite(): whether it is writable;
boolean canExecute(): executable;
long length(): File byte size.

For a directory, executable indicates whether the files and subdirectories it contains can be listed.

Create and delete files

When the File object represents a File, you can create a new File through createNewFile() and delete the File with delete():

        File file = new File("hello.txt");
        if(file.createNewFile()){
            System.out.println("File created successfully");
            if(file.delete()){
                System.out.println("File deleted successfully");
            }
        }

The File object provides createTempFile() to create a temporary File, and deleteOnExit() to automatically delete the File when the JVM exits.

        File file1 = File.createTempFile("tmp-",".txt");
        file1.deleteOnExit();
        System.out.println(file1.getAbsolutePath());

Traverse files and directories

When the File object represents a directory, you can use list() and listfiles () to list the File and subdirectory names under the directory. listFiles() provides a series of overloaded methods to filter unwanted files and directories:

        File file2 = new File(".");
        String[] array1 = file2.list();
        File[] array2 = file2.listFiles();

        File[] fs2 = f.listFiles(new FilenameFilter() { // List only exe file
            public boolean accept(File dir, String name) {
                return name.endsWith(".exe"); // Return true to accept the file
            }
        });

Similar to File operation, if a File object represents a directory, you can create and delete the directory by the following methods:

boolean mkdir(): creates the directory represented by the current File object;
boolean mkdirs(): create the directory represented by the current File object, and create the nonexistent parent directory if necessary;
boolean delete(): deletes the directory represented by the current File object. The current directory must be empty to delete successfully.

Path

The Java standard library also provides a Path object, which is located in Java nio. File package. The Path object is similar to the file object, but the operation is simpler:

    public static void test3() {
        Path p1 = Paths.get("Java.iml");
        System.out.println(p1);
        Path p2 = p1.toAbsolutePath();
        System.out.println(p2);
        Path p3 = p2.normalize(); // Convert to canonical path
        System.out.println(p3);
        File f = p3.toFile(); // Convert to File object
        System.out.println(f);
        Path p4 = Paths.get(".");
        System.out.println("-------------------");
        System.out.println(p4.toAbsolutePath()); //D:\this_is_feng\Java\.
        for (Path p: p4.toAbsolutePath()){
            System.out.println(p); 
            //this_is_feng
            //Java
            //.
        }
    }

InputStream

InputStream is the most basic input stream provided by the Java standard library. It is located in Java I'm in this bag. java. The IO package provides all synchronous IO functions.

One thing to note is that InputStream is not an interface, but an abstract class. It is a superclass of all input streams. One of the most important methods defined by this abstract class is int read(). The signature is as follows:

public abstract int read() throws IOException;

This method will read the next byte of the input stream and return the int value represented by bytes (0 ~ 255). If you have read to the end, return - 1 to indicate that you can't continue reading.

FileInputStream is a subclass of InputStream. As the name suggests, FileInputStream reads data from a file stream.

    // Create a FileInputStream object:
    InputStream input = new FileInputStream("Java.iml");
    for (;;) {
        int n = input.read(); // Call the read() method repeatedly until - 1 is returned
        if (n == -1) {
            break;
        }
        System.out.println(n); // Print the value of byte
    }
    input.close(); // Close flow

Both InputStream and OutputStream close the stream through the close() method. Closing the flow will release the corresponding underlying resources.

All code related to IO operations must handle IOException correctly.

If an IO error occurs during reading, the InputStream cannot be shut down correctly and the resources cannot be released in time.

So we need to use try Finally to ensure that the InputStream can be shut down correctly no matter whether an IO error occurs or not.

A better way is to use the new try(resource) syntax introduced by Java 7. You only need to write a try statement to let the compiler automatically close the resources for us:

        try(InputStream input = new FileInputStream("Java.iml")){
            int n;
            while(( n = input.read()) != -1){
                System.out.print((char)n);
            }
        }

In fact, the compiler does not specifically add auto shutdown for InputStream. The compiler only looks at try(resource =...) Whether the object in implements Java Lang. autoclosable interface, if implemented, will automatically add a finally statement and call the close() method. Both InputStream and OutputStream implement this interface, so they can be used in try(resource).

buffer

When reading a stream, reading one byte at a time is not the most efficient method. Many streams support reading multiple bytes to the buffer at one time. For file and network streams, using the buffer to read multiple bytes at one time is often much more efficient. InputStream provides two overloaded methods to support reading multiple bytes:

int read(byte[] b): reads several bytes and fills them into the byte [] array, and returns the number of bytes read
int read(byte[] b, int off, int len): Specifies the offset and maximum number of padding of the byte [] array

When using the above method to read multiple bytes at a time, you need to define a byte [] array as the buffer. The read() method will read as many bytes into the buffer as possible, but will not exceed the size of the buffer. The return value of the read() method is no longer the int value of bytes, but returns how many bytes were actually read. If - 1 is returned, there is no more data.

ByteArrayInputStream

Using FileInputStream, you can get the input stream from the file, which is a common implementation class of InputStream. In addition, ByteArrayInputStream can simulate an InputStream in memory:

        byte[] data = { 72, 101, 108, 108, 111, 33 };
        try (InputStream input = new ByteArrayInputStream(data)) {
            int n;
            while ((n = input.read()) != -1) {
                System.out.println((char)n);
            }
        }

OutputStream

Similar to InputStream, OutputStream is also an abstract class, which is a superclass of all output streams. One of the most important methods defined by this abstract class is void write(int b). The signature is as follows:

public abstract void write(int b) throws IOException;

This method writes a byte to the output stream. Note that although the int parameter is passed in, only one byte will be written, that is, only the part of the byte represented by the lowest 8 bits of int (equivalent to B & 0xff).

Similar to InputStream, OutputStream also provides a close() method to close the output stream to free system resources. Note in particular: OutputStream also provides a flush() method, which is designed to actually output the contents of the buffer to the destination.

Why flush()? Because when writing data to the disk and network, for efficiency reasons, the operating system does not output a byte and immediately write it to the file or send it to the network, but first put the output bytes into a buffer in memory (essentially a byte [] array), wait until the buffer is full, and then write it to the file or network at one time. For many IO devices, writing one byte at a time takes almost the same time as writing 1000 bytes at a time. Therefore, OutputStream has a flush() method that can force the contents of the buffer to be output.

Normally, we don't need to call this flush() method, because when the buffer is full, the OutputStream will call it automatically, and the flush() method will also be called automatically before calling the close() method to close the OutputStream.

However, in some cases, we have to call the flush() method manually.

In fact, InputStream also has buffers. For example, when reading a byte from FileInputStream, the operating system often reads several bytes to the buffer at one time and maintains a pointer to the unread buffer. Then, every time we call int read() to read the next byte, we can directly return the next byte of the buffer to avoid IO operation every time we read a byte. When read() is called after all the buffers have been read, the next read by the operating system will be triggered and the buffer will be filled again.

FileOutputStream

        try(OutputStream output = new FileOutputStream("out.txt")){
            output.write((int)'h');
            output.write((int)'e');
            output.write((int)'l');
            output.write((int)'l');
            output.write((int)'o');
        }

Similarly, the write method has overloads:

        try(OutputStream output = new FileOutputStream("out.txt")){
            output.write("feng".getBytes(StandardCharsets.UTF_8));
        }

ByteArrayOutputStream

With FileOutputStream, you can get the output stream from the file, which is a common implementation class of OutputStream. In addition, ByteArrayOutputStream can simulate an OutputStream in memory:

        byte[] data;
        try (ByteArrayOutputStream output = new ByteArrayOutputStream()) {
            output.write("Hello ".getBytes("UTF-8"));
            output.write("world!".getBytes("UTF-8"));
            data = output.toByteArray();
        }
        System.out.println(new String(data, "UTF-8"));

Copy file

When multiple autoclosable resources are operated at the same time, in try(resource) {...} Multiple resources can be written out at the same time in the statement; separate.

Code of a copied file:

    public static void test7() throws IOException{
        try(InputStream input = new FileInputStream("Java.iml");
            OutputStream out = new FileOutputStream("out.txt")){
            int n ;
            StringBuilder sb = new StringBuilder();
            while(( n = input.read())!= -1){
                out.write(n);
            }
        }
    }

Filter mode

In order to solve the problem that dependent inheritance will cause the number of subclasses to get out of control, JDK first divides InputStream into two categories:

One is the basic InputStream that directly provides data, such as:

FileInputStream
ByteArrayInputStream
ServletInputStream
...

One is InputStream that provides additional functions, such as:

BufferedInputStream
DigestInputStream
CipherInputStream
...

When we need to add various functions to a "basic" InputStream, we first determine the InputStream that can provide data source, because the data we need always comes from somewhere, for example, FileInputStream, and the data comes from files:

InputStream file = new FileInputStream("test.gz");

Next, we hope that FileInputStream can provide buffering function to improve reading efficiency. Therefore, we wrap this InputStream with BufferedInputStream. The obtained packaging type is BufferedInputStream, but it is still regarded as an InputStream:

InputStream buffered = new BufferedInputStream(file);

Finally, assuming that the file has been compressed with gzip, we want to directly read the extracted content, and then we can package a GZIPInputStream:

InputStream gzip = new GZIPInputStream(buffered);

No matter how many times we wrap, the object we get is always InputStream. We can reference it directly with InputStream and read it normally.

The above mode of superimposing various "additional" functional components through a "basic" component is called Filter mode (or Decorator mode). It allows us to realize the combination of various functions through a small number of classes.

The IO standard library of Java uses the Filter mode to add functions to InputStream and OutputStream:

You can combine an InputStream with any FilterInputStream;
You can combine an OutputStream with any FilterOutputStream.

Filter mode can dynamically add functions during runtime (also known as Decorator mode).

Operation Zip

ZipInputStream is a FilterInputStream that can directly read the contents of a zip package.

Read Zip

We want to create a ZipInputStream, usually pass in a FileInputStream as the data source, and then call getNextEntry() repeatedly until null is returned, indicating the end of the zip stream.

A ZipEntry represents a compressed file or directory. If it is a compressed file, we use the read() method to read it continuously until - 1 is returned:

Just look at the example directly. The usage is as follows:

    public static void test8() throws IOException{
        try(ZipInputStream zip = new ZipInputStream(new FileInputStream("1.zip"))){
            ZipEntry entry = null;
            while ((entry = zip.getNextEntry()) != null){
                String name = entry.getName();
                System.out.println(name);
                if(!entry.isDirectory()){
                    int n;
                    while ((n = zip.read())!= -1){
                        System.out.print((char)n);
                    }
                }
                System.out.println();
            }
        }
    }

Write Zip

Zipoutputstream is a FilterOutputStream that can write content directly to a zip package. We first create a ZipOutputStream, usually wrapping a FileOutputStream. Then, before writing a file, we call putNextEntry() first, then write byte[] data with write(), and finish the package after calling closeEntry().

    public static void test9() throws IOException{
        try(ZipOutputStream zip = new ZipOutputStream(new FileOutputStream("out.zip"))){
            File file = new File("out.txt");
            zip.putNextEntry(new ZipEntry(file.getName()));

            zip.write(getFileDataAsBytes(file));
            zip.closeEntry();
        }
    }

The above code does not consider the directory structure of the file. If you want to implement the directory hierarchy, the name passed in by new ZipEntry(name) should use a relative path.

About getFileDataAsBytes(), I wrote two types:

    public static byte[] getFileDataAsBytes(File file) throws IOException {
        byte[] data;
        try(InputStream input = new FileInputStream(file);
            ByteArrayOutputStream out = new ByteArrayOutputStream()){
            int n;
            while ((n = input.read())!= -1){
                out.write(n);
            }
            data = out.toByteArray();
        }
        return data;
    }

    public static byte[] getFileDataAsBytes(File file) throws IOException {
        try(InputStream input = new FileInputStream(file)){
            StringBuilder sb = new StringBuilder();
            int n;
            while((n = input.read())!=-1){
                sb.append((char)n);
            }
            return sb.toString().getBytes(StandardCharsets.UTF_8);
        }
    }

The first one refers to others, and the second one is written by myself. I think my thinking is too limited. Ba, I think the first one may be better.

Read classpath resource

The resource files in classpath always start with / at the beginning. We first get the current Class object and then call getResourceAsStream() to read any resource file directly from classpath.

One thing to note when calling getResourceAsStream() is that it will return null if the resource file does not exist. Therefore, we need to check whether the returned InputStream is null. If it is null, it means that the resource file is not found in the classpath:

try (InputStream input = getClass().getResourceAsStream("/default.properties")) {
    if (input != null) {
        // TODO:
    }
}

If we put the default configuration into the jar package and read an optional configuration file from the external file system, we can not only have the default configuration file, but also allow users to modify the configuration themselves:

Properties props = new Properties();
props.load(inputStreamFromClassPath("/default.properties"));
props.load(inputStreamFromFile("./conf.properties"));

In this way, the configuration file is read and the application startup is more flexible.

serialize

No, I'll focus on Java serialization and deserialization after I've learned the IO stream part.

Reader

The content is similar to InputStream.

Reader is another input stream interface provided by Java's IO library. The difference between InputStream and InputStream is that InputStream is a byte stream, that is, read in bytes, while reader is a character stream, that is, read in char:

InputStream	Reader
byte stream, in bytes	Character stream in char
Read bytes (- 1, 0 ~ 255): int read()	Read character (- 1, 0 ~ 65535): int read()
Read byte array: int read(byte[] b)	Read character array: int read(char[] c)

java.io.Reader is a superclass of all character input streams. Its main methods are:

public int read() throws IOException;

This method reads the next character in the character stream and returns the int represented by the character, ranging from 0 to 65535. Returns - 1 if it has read to the end.

FileReader

FileReader is a subclass of Reader, which can open files and get readers.

        Reader reader = new FileReader("Java.iml");
        int n;
        while( (n = reader.read())!= -1){
            System.out.print((char)n);
        }
        reader.close();

It also needs to be close d. Therefore, try can also be used:

        try (Reader reader = new FileReader("Java.iml")) {
            int n;
            while ((n = reader.read()) != -1) {
                System.out.print((char) n);
            }
        }

If we read a pure ASCII encoded text file, the above code will work without problem. However, if the file contains Chinese, garbled code will appear, because the default encoding of FileReader is related to the system. For example, the default encoding of Windows system may be GBK, and garbled code will appear when opening a UTF-8 encoded text file.

To avoid garbled code, we need to specify the encoding when creating FileReader:

try (Reader reader = new FileReader("Java.iml", StandardCharsets.UTF_8)) {

CharArrayReader

CharArrayReader can simulate a Reader in memory. Its function is actually to turn a char [] array into a Reader, which is very similar to ByteArrayInputStream:

try (Reader reader = new CharArrayReader("Hello".toCharArray())) {
}

StringReader

StringReader can directly use String as the data source, which is almost the same as CharArrayReader:

try (Reader reader = new StringReader("Hello")) {
}

InputStreamReader

What is the relationship between Reader and InputStream?

In addition to the special CharArrayReader and StringReader, the ordinary Reader is actually constructed based on InputStream, because the Reader needs to read the byte stream from InputStream, and then convert it to char according to the encoding settings to realize the character stream. If we look at the source code of FileReader, it actually holds a FileInputStream internally.

InputStreamReader is such a converter that can convert any InputStream into a Reader

// Hold InputStream:
InputStream input = new FileInputStream("src/readme.txt");
// Convert to Reader:
Reader reader = new InputStreamReader(input, "UTF-8");

When constructing InputStreamReader, we need to pass in InputStream and specify the code to get a Reader object. The above code can be more succinctly rewritten as follows through try (resource):

try (Reader reader = new InputStreamReader(new FileInputStream("src/readme.txt"), "UTF-8")) {
    // TODO:
}

Writer

The same goes for me. Just sort it out.

Reader is an InputStream with an encoding converter, which converts byte into char, while Writer is an OutputStream with an encoding converter, which converts char into byte and outputs it.

The differences between Writer and OutputStream are as follows:

OutputStream	Writer
byte stream, in bytes	Character stream in char
Write bytes (0 ~ 255): void write(int b)	Write character (0 ~ 65535): void write(int c)
Write byte array: void write(byte[] b)	Write character array: void write(char[] c)
No corresponding method	Write String: void write(String s)

Writer is a superclass of all character output streams. Its main methods include:

Write a character (0 ~ 65535): void write(int c);
Write all characters in the character array: void write(char[] c);
Write all characters represented by String: void write(String s).

FileWrite

        try(Writer writer = new FileWriter("out.txt", StandardCharsets.UTF_8)){
            writer.write((int)'a');
            writer.write("hello".toCharArray());
            writer.write("world");
        }

CharArrayWriter

CharArrayWriter can create a Writer in memory. Its function is actually to construct a buffer, write char, and finally get the written char [] array, which is very similar to ByteArrayOutputStream:

try (CharArrayWriter writer = new CharArrayWriter()) {
    writer.write(65);
    writer.write(66);
    writer.write(67);
    char[] data = writer.toCharArray(); // { 'A', 'B', 'C' }
}

StringWriter

StringWriter is also a memory based Writer, which is similar to CharArrayWriter. In fact, StringWriter maintains a StringBuffer internally and provides a Writer interface externally.

OutputStreamWriter

In addition to CharArrayWriter and StringWriter, an ordinary Writer is actually constructed based on OutputStream. It receives char, automatically converts it into one or more byte s internally, and writes it to OutputStream. Therefore, OutputStreamWriter is a converter that converts any OutputStream to a Writer:

try (Writer writer = new OutputStreamWriter(new FileOutputStream("readme.txt"), "UTF-8")) {
    // TODO:
}

PrintStream and PrintWriter

PrintStream

PrintStream is a FilterOutputStream. It provides additional methods for writing various data types on the interface of OutputStream:

Write int: print(int)
Write boolean: print(boolean)
Write String: print(String)
Writing Object: print(Object) is actually equivalent to print(object.toString())
...

And a corresponding set of println() methods, which will automatically add line breaks.

We often use system out. Println () actually uses PrintStream to print various data. Where, system Out is the PrintStream provided by default and represents standard output.

System.err is the standard error output provided by the system by default.

Compared with OutputStream, PrintStream not only adds a set of print()/println() methods to print various data types, which is more convenient, but also has an additional advantage that it will not throw ioexceptions, so we don't have to catch ioexceptions when writing code.

PrintWriter

The final output of PrintStream is always byte data, while PrintWriter extends the Writer interface. Its print()/println() method finally outputs char data. The two methods are as like as two peas.

Files

Starting from Java 7, two tool classes, Files and Paths, are provided, which can greatly facilitate us to read and write Files.

Although Files and Paths are Java NiO package, but they encapsulate many simple methods to read and write Files. For example, we need to read all the contents of a file as a byte [], which can be written as follows:

byte[] data = Files.readAllBytes(Paths.get("/path/to/file.txt"));

If it is a text file, you can read all the contents of a file as String:

// Read with UTF-8 encoding by default:
String content1 = Files.readString(Paths.get("/path/to/file.txt"));
// Code can be specified:
String content2 = Files.readString(Paths.get("/path/to/file.txt"), StandardCharsets.ISO_8859_1);
// Read and return each line by line:
List<String> lines = Files.readAllLines(Paths.get("/path/to/file.txt"));

Writing files is also very convenient:

// Write binary:
byte[] data = ...
Files.write(Paths.get("/path/to/file.txt"), data);
// Write text and specify encoding:
Files.writeString(Paths.get("/path/to/file.txt"), "Text content...", StandardCharsets.ISO_8859_1);
// Write text by line:
List<String> lines = ...
Files.write(Paths.get("/path/to/file.txt"), lines);

In addition, the Files tool class also has shortcut methods such as copy(), delete(), exists(), move(), etc. to operate Files and directories.

Finally, it should be noted that the reading and writing methods provided by Files are limited by memory. They can only read and write small Files, such as configuration Files, and cannot read several G large Files at a time. To read and write large Files, you still need to use file stream, and read and write part of the file content at a time.

Topics: Java

Programmer Think

Java IO streaming learning notes

Java IO streaming learning notes

preface

brief introduction

File object

Files and directories

Create and delete files

Traverse files and directories

Path

InputStream

buffer

ByteArrayInputStream

OutputStream

FileOutputStream

ByteArrayOutputStream

Copy file

Filter mode

Operation Zip

Read Zip

Write Zip

Read classpath resource

serialize

Reader

FileReader

CharArrayReader

StringReader

InputStreamReader

Writer

FileWrite

CharArrayWriter

StringWriter

OutputStreamWriter

PrintStream and PrintWriter

PrintStream

PrintWriter

Files

Hot Topics