"Life is short. It's better to have a dog Author: Bruce bat Sun "
1, Summary
1. What are characters
letters, numbers and symbols (including operation symbols, punctuation symbols and other symbols) are called characters in the computer. It should be noted that the Character is an information unit, and the byte is the basic unit of data structure storage in the computer. When characters are stored, they will be converted into one or more bytes for storage according to the Character encoding set used by the program.
2. Usage scenario
from the above concept, it is not difficult to see that characters are mainly introduced to deal with text data. Due to the variety of languages in the world, in order to enable computers to store and process different languages, the symbols in these languages are mapped similar to natural number sequences through coding sets. Through such mapping processing, the characters of different languages will be represented as different binary numbers, and one or more bytes need to be used for storage, which leads to the problem of garbled code in the byte stream processing of text data mentioned in the previous article. Let's take a look at the problems of byte stream processing through Chinese characters.
Chinese character garbled problem
the code logic in the above figure is to read the data one by one through the byte stream, and then convert it into characters and print it to the console. The specific results are as follows:
from the above results, it can be seen that there will be garbled code when reading Chinese characters in this way. This is mainly because during character conversion, Chinese characters use multiple bytes for storage (when using UTF-8 coding, Chinese characters need to use 3 bytes for storage, while when using Unicode, Chinese characters need to use 2 bytes for storage). When using a single byte for reading, the text data itself is damaged, As a result, the final read data is mapped to the character set to obtain characters that are inconsistent with the expected target, which leads to the problem of garbled code shown in the above picture.
to avoid this problem, we need to use character stream for processing when reading text data.
2, Input / output character stream and use
in the Java IO class library, all character streams are subclasses of Reader/Writer, and the class name of both input and output streams will end with Reader/Writer. This is a default specification, and developers need to abide by this specification when implementing self-defined subclasses. Next, we will explain it respectively according to the two parts of input stream and output stream.
1. Input stream
1.1 Reader class analysis
as in the previous article, before using the input character stream, let's take a look at the "father of the input character stream" Reader. There are four ways to read data in Reader. Here we mainly understand the following three common methods to read data:
- int read(): this method reads the data of one character from the target data source. The returned content here is the character data stored in bytes (multiple bytes according to different coding sets). When reading to the end of the stream or I/O exception occurs, the return value is - 1. In order to realize continuous reading of character array, most of its implementation classes will use a pointer object to identify the position of the current read data;
- int read(char cbuf []): this method reads all character array data from the specified character array at one time;
- int read(char cbuf[], int off, int len): this method is used to read the data within the specified range of the character array. It is also the method actually used by the above two methods when reading (the above two methods may not be used in some subclass implementations). It should be noted that this method is an abstract method and needs to be implemented by subclasses.
here we use the first method to summarize the programming paradigm of Reader class:
// Create a character input stream, where XXXReader is a subclass of Reader and data is the data to be read. Try with resources is used here to avoid the closed flow displayed try (Reader reader = new XXXReader(data)) { // Reads the first character in the character input stream into the buffer int read = reader.read(); // If the value returned by read is less than 0, it indicates that the reading is completed while (read != -1) { // Process the read data System.out.print((char) read); // Continue reading the next character read = reader.read(); } } catch (Exception e) { e.printStackTrace(); }
in addition to the above reading method, you will find something that does not appear in the InputStream - lock object in reading the Reader source code.
combined with the implementation of the subclass of the above reading method, I believe you should be able to understand why there is a lock object here, because when multiple threads use a Reader object for data at the same time, the internal pointer used to identify the reading position will have concurrency problems during pos + + operation, that is, repeated reading problems will occur. So smart students may ask, wouldn't InptStream have the same question? Of course, there will be, but the read() method is modified with the synchronized keyword in inputskeleton.
a new problem is coming. Why not use the same method for processing in the Reader, but introduce a lock member variable? The official document gives the following explanation:
The object used to synchronize operations on this stream. For efficiency, a character-stream object may use an object other than itself to protect critical sections. A subclass should therefore use the object in this field rather than this or a synchronized method.
simple understanding is to transfer concurrency control to external objects, rather than relying solely on the current Reader object.
1.2 use cases
let's learn about the Reader using process through CharArrayReader and FileReader.
a. CharArrayReader
String text = "Batman"; // Create a character input stream, where try with resources is used to avoid the displayed closed stream try (Reader reader = new CharArrayReader(text.toCharArray())) { // Reads the first character in the character input stream into the buffer int read = reader.read(); // If the value returned by read is less than 0, it indicates that the reading is completed while (read != -1) { // Process the read data System.out.print((char) read); // Continue reading the next character read = reader.read(); } } catch (Exception e) { e.printStackTrace(); }
b. FileReader
// Create a character input stream, where try with resources is used to avoid the displayed closed stream try (Reader reader = new FileReader("/Users/suntianyu/Desktop/test.json")) { // Reads the first character in the character input stream into the buffer int read = reader.read(); // If the value returned by read is less than 0, it indicates that the reading is completed while (read != -1) { // Process the read data System.out.print((char) read); // Continue reading the next character read = reader.read(); } } catch (Exception e) { e.printStackTrace(); }
2. Output stream
2.1 analysis of writer class
as above, let's take a look at the "father of character output stream" Writer. The Writer provides the following writing methods:
- void write(int c): this method provides the ability to write a single character. Note that although the input parameter here is an int type data, the lower 16 bits of the integer data will be read during the actual write operation;
- void write(char cbuf []): this method provides the ability to write the entire character array at one time;
- void write(char cbuf[], int off, int len): this method provides the ability to write data in the specified range of the character array. It is also the method called by the previous method during the actual write operation;
- void write(String str): this method provides the ability to directly read strings;
- void write(String str, int off, int len): this method provides the ability to read the data in the specified range of the string. It is also the method called by the previous method during the actual write operation;
since we mostly use String for text data operation in engineering, here we summarize the programming paradigm of Writer according to the fourth method:
String text = "Batman"; // Create a character output stream, in which XXXWriter is a subclass of Writer. Here, try with resources is used to avoid the displayed closed stream try (Writer writer = new XXXWriter()) { // Reads the first character in the character input stream into the buffer writer.write(text); // Writes the data in the buffer to the output stream writer.flush(); } catch (Exception e) { e.printStackTrace(); }
in addition to the above writing methods, when reading the source code of the Writer class, we again found something that does not exist in the OutputStream - the buffer. The official note gives the following explanation:
Temporary buffer used to hold writes of strings and single characters
simply translate, buffer is used to temporarily store string data or single characters to be written. Although the description of the writeBuffer variable is given here, it seems that nothing has been said and understood.
so why do you need to use buffers in character streams? This is mainly because when writing character or string data, the Writer and its subclasses need to convert the corresponding character into the corresponding byte data according to the code set before initiating the write request. If each character write requires a separate call to IO write, the performance consumption after multiple superposition is very huge (encoding mapping time + IO operation time). Therefore, the Writer class uses the buffer to buffer the data. When the buffer is full, the data of a whole buffer is sent for writing at one time to reduce the number of IO calls. At this time, the encoding mapping time does not change. However, due to the reduction of IO times, the IO operation time is also greatly reduced, and the total writing time is greatly reduced.
2.2 use cases
now let's learn about the use process of Writer through FileWriter.
String text = "Batman"; // Create a character output stream. Here, try with resources is used to avoid the displayed closed stream try (Writer writer = new FileWriter("/Users/suntianyu/Desktop/test.json")) { // Reads the first character in the character input stream into the buffer writer.write(text); // Writes the data in the buffer to the output stream writer.flush(); } catch (Exception e) { e.printStackTrace(); }
similar to FileOutputStream, FileWriter encapsulates a similar relatively simple file writing capability and shields the API details related to file operation.
3, Summary
in short, the birth of character stream is mainly to deal with different character operation problems caused by different languages. After understanding this point, when reading and writing data, we encounter the problem of whether to use byte stream or character stream. I believe you will make a better judgment. Here I have to feel how wise it is for Qin Shihuang to unify the characters at present. Now I just hope that Chinese characters can become the world's official characters as soon as possible. Ha ha ha~~