Go language from introduction to specification - 6.3, go language io operation - io related package and bufio
1. Preface
*** (I/O is not much to say here. A simple understanding is the input / output of data. Generally, we need to understand I/O before learning file operation and network programming, because files and networks are related to data I/O. we have contacted the basic interface of file operation in the os package before. Here we will briefly understand and summarize I/O to facilitate the understanding of common file operations How to combine with I/O interface, flexibly read and write files, and pave the way for subsequent learning of net/http/rpc and other package related contents)
2. io,ioutil,bufio
Official website standard package address:
io package provides a basic interface for I/O primitives It mainly encapsulates the existing implementations of these primitives, such as those in the os package, abstracted into functional shared public interfaces, plus some other related primitives.
Since these interfaces and primitives wrap low-level operations in different implementations, customers should not assume that they are safe for parallel execution unless otherwise notified.
Therefore, we rarely use io packages directly, but only provide basic interfaces to facilitate our expansion, that is, we can also use io to realize ioutil, bufio and other similar functions.
- ioutil: https://go-zh.org/pkg/io/ioutil/
Ioutil implements some I/O tool functions. Without special business needs, ioutil is actually quite useful, simple and rough.
- bufio: https://go-zh.org/pkg/bufio/
bufio package realizes I/O operation with cache It encapsulates an IO Reader or io The Writer object creates another object (reader or Writer), which also implements an interface and provides help for buffering and document reading and writing.
3. bufio
3.1. constant
const ( // MaxScanTokenSize is the maximum size used to buffer a token. // The actual maximum token size may be smaller as the buffer // may need to include, for instance, a newline. MaxScanTokenSize = 64 * 1024 )
3.2. variable
var ( ErrInvalidUnreadByte = errors.New("bufio: invalid use of UnreadByte") ErrInvalidUnreadRune = errors.New("bufio: invalid use of UnreadRune") ErrBufferFull = errors.New("bufio: buffer full") ErrNegativeCount = errors.New("bufio: negative count") ) var ( ErrTooLong = errors.New("bufio.Scanner: token too long") ErrNegativeAdvance = errors.New("bufio.Scanner: SplitFunc returns negative advance count") ErrAdvanceTooFar = errors.New("bufio.Scanner: SplitFunc returns advance count beyond input") )
Errors returned by Scanner.
3.3. func ScanBytes
func ScanBytes(data []byte, atEOF bool) (advance int, token []byte, err error)
ScanBytes is a segmentation function for Scanner type (conforming to SplitFunc). This function will return each byte as a token.
3.4. func ScanLines
func ScanLines(data []byte, atEOF bool) (advance int, token []byte, err error)
ScanLines is a segmentation function for Scanner type (conforming to SplitFunc). This function will remove the newline mark at the end of each line of text and return it as a token. The returned line can be an empty string. Newline is marked as an optional carriage return followed by a required newline character. Even if there is no token, it will return as the last line feed.
3.5. func ScanRunes
func ScanRunes(data []byte, atEOF bool) (advance int, token []byte, err error)
ScanRunes is a segmentation function for Scanner type (conforming to SplitFunc). This function will return the unicode code value of each utf-8 encoding as a token. The rune sequence returned by this function is the same as the output Rune sequence of a string of range. The wrong utf-8 code will be translated as U+FFFD = "\ xef\xbf\xbd", but only one byte will be consumed. The caller cannot distinguish between correctly encoded runes and incorrectly encoded runes.
3.6. func ScanWords
func ScanWords(data []byte, atEOF bool) (advance int, token []byte, err error)
ScanWords is a segmentation function for Scanner type (conforming to SplitFunc). This function will remove the newline mark at the end of each line of text and return it as a token. The returned line can be an empty string. Newline is marked as an optional carriage return followed by a required newline character. The last line will be returned as a token even if there is no newline character.
3.7. type ReadWriter
type ReadWriter struct { *Reader *Writer }
Readwriter stores input and output pointers. It implements io ReadWriter.
(1). func NewReadWriter
func NewReadWriter(r *Reader, w *Writer) *ReadWriter
NewReadWriter assigns a new ReadWriter to schedule r and w.
3.8. type Reader
type Reader struct { // contains filtered or unexported fields }
Reader implements an IO Buffered read of the reader object.
(1). func NewReader
func NewReader(rd io.Reader) *Reader
NewReader returns a new Reader whose size is the default size.
(2).func NewReaderSize
func NewReaderSize(rd io.Reader, size int) *Reader
NewReaderSize returns a new Reader whose cache size is at least larger than the specified size. If io The Reader parameter is already a Reader with a large enough cache, and it will return this Reader.
(3). func (*Reader) Buffered
func (b *Reader) Buffered() int
Buffered returns the number of readable bytes currently cached.
(4). func (*Reader) Discard
func (b *Reader) Discard(n int) (discarded int, err error)
Discard skips the last n bytes and returns the number of bytes discarded.
If Discard skips less than n bytes, it will also return an error. If 0 < = n < = B.B buffered(), Discard guarantees that it will not be transferred from the underlying io Read from reader.
(5). func (*Reader) Peek
func (b *Reader) Peek(n int) ([]byte, error)
Peek returns the next N bytes not read. Bytes are not visible until the next read call. If peek returns fewer bytes than N, it will certainly explain why the number of bytes read is too small. If n is larger than b, the error returned is ErrBufferFull.
(6). func (*Reader) Read
func (b *Reader) Read(p []byte) (n int, err error)
Read reads data to p. Returns the number of bytes read to p. The underlying read will only call read once at most, so n will be less than len §. After EOF, this function returns 0 and io.. Eof.
(7). func (*Reader) ReadByte
func (b *Reader) ReadByte() (c byte, err error)
ReadByte reads and replies to a single byte. If there are no bytes to read, an error is returned.
(8). func (*Reader) ReadBytes
func (b *Reader) ReadBytes(delim byte) (line []byte, err error)
When ReadBytes reads the input and the first terminator occurs, the returned slice contains the contents from the current to the Terminator (including the terminator). If ReadBytes catches an error before encountering the terminator, it will return the data read before encountering the error and the captured error (often io.EOF). When the returned data does not end with a terminator, ReadBytes returns err= nil. For simple use, perhaps Scanner is more convenient.
(9). func (*Reader) ReadLine
func (b *Reader) ReadLine() (line []byte, isPrefix bool, err error)
ReadLine is an underlying raw read command. Many callers may use ReadBytes('\ n') or ReadString('\ n') instead of this method.
ReadLine attempts to return a single line, excluding the last delimiter at the end of the line. If a row is larger than the cache and ifPrefix is returned when calling, the header of the row will be returned. The rest of the line will be returned at the next call. When calling the rest of the line, isPrefix will be set to false, and the returned cache can only be seen the next time readLine is called. ReadLine will return a non empty line or an error, but not both.
The text returned by readLine will not contain the end of the line ("\ r\n" or "\ n"). If the input has no final line end, no sign or error will be returned. Calling UnreadByte after ReadLine will always be put back to the last byte (which may belong to the end of the line), even if the byte is not part of the row returned by ReadLine.
(10). func (*Reader) ReadRune
func (b *Reader) ReadRune() (r rune, size int, err error)
ReadRune reads a single UTF-8 encoded Unicode byte and returns Rune and its byte size. If the encoded rune is visible, it consumes one byte and returns one byte of Unicode ReplacementChar (U+FFFD).
(11). func (*Reader) ReadSlice
func (b *Reader) ReadSlice(delim byte) (line []byte, err error)
ReadSlice reads from the input until the first terminator is encountered, and returns a slice pointing to the bytes in the cache. At the next call, these bytes have been read. If ReadSlice encounters an error before finding the terminator, it will return all the data in the cache and the error itself (often io.EOF). If the cache is full before the terminator, ReadSlice will return ErrBufferFull error. Since the data returned by ReadSlice will be overwritten by the next I/O operation, many clients will choose to use ReadBytes or ReadString instead. ReadSlice returns err if and only if the data does not end with a Terminator= nil
(12). func (*Reader) ReadString
func (b *Reader) ReadString(delim byte) (line string, err error)
ReadString reads the input. When the first terminator occurs, the returned string contains the content from the current to the Terminator (including the terminator). If ReadString catches an error before encountering the terminator, it will return the data read before encountering the error and the captured error (often io.EOF). When the returned data does not end with a terminator, ReadString returns err= nil. For simple use, perhaps Scanner is more convenient.
(13). func (*Reader) Reset
func (b *Reader) Reset(r io.Reader)
Reset discards the data in the buffer, clears any errors, resets b to its lower layer, and reads data from r.
(14). func (*Reader) UnreadByte
func (b *Reader) UnreadByte() error
UnreadByte marks the last byte as unread. Only the last byte can be marked as unread.
(15). func (*Reader) UnreadRune
func (b *Reader) UnreadRune() error
Unread run sets the last run to unread. If the latest operation on the buffer is not readrun, unrearun will return an error. (from this point of view, this function is more strict than UnreadByte, which sets the last byte read as unread.)
(16). func (*Reader) WriteTo
func (b *Reader) WriteTo(w io.Writer) (n int64, err error)
WriteTo implements io WriterTo.
3.9. type Scanner
type Scanner struct { // contains filtered or unexported fields }
The Scanner type provides a convenient interface for reading data, such as reading each line from the text separated by a newline character.
The successfully called Scan method will gradually provide the token of the file and skip the bytes between the tokens. The token is specified by the split function of SplitFunc type; The default split function will split the input into multiple lines and remove the newline flag at the end of the line. The predefined segmentation function in this package can divide the file into lines, bytes, unicode code values and blank separated word s. Callers can customize their own segmentation functions.
The scan will stop irrecoverably when it reaches the end of the input stream, the first I/O error encountered, and the token is too large to be saved into the buffer. When the scanning stops, the current read position may be far behind the last token obtained. For programs that need more control over error management, or have a large token, or must be scanned continuously from the reader, bufio should be used Reader instead.
(1). Example (Custom)
Use scanner with a custom split function (built by wrapping ScanWords) to validate 32-bit decimal input.
code:
// An artificial input source. const input = "1234 5678 1234567901234567890" scanner := bufio.NewScanner(strings.NewReader(input)) // Create a custom split function by wrapping the existing ScanWords function. split := func(data []byte, atEOF bool) (advance int, token []byte, err error) { advance, token, err = bufio.ScanWords(data, atEOF) if err == nil && token != nil { _, err = strconv.ParseInt(string(token), 10, 32) } return } // Set the split function for the scanning operation. scanner.Split(split) // Validate the input for scanner.Scan() { fmt.Printf("%s\n", scanner.Text()) } if err := scanner.Err(); err != nil { fmt.Printf("Invalid input: %s", err) }
Output:
1234 5678 Invalid input: strconv.ParseInt: parsing "1234567901234567890": value out of range
(2). Example (Lines)
The simplest use of Scanner is to read standard input as a set of rows.
package main import ( "bufio" "fmt" "os" ) func main() { scanner := bufio.NewScanner(os.Stdin) for scanner.Scan() { fmt.Println(scanner.Text()) // Println will add back the final '\n' } if err := scanner.Err(); err != nil { fmt.Fprintln(os.Stderr, "reading standard input:", err) } }
(3). Examples (Words)
By scanning the input in the form of space separated symbol sequence, a simple word counting utility is realized by using Scanner program.
package main import ( "bufio" "fmt" "os" "strings" ) func main() { // An artificial input source. const input = "Now is the winter of our discontent,\nMade glorious summer by this sun of York.\n" scanner := bufio.NewScanner(strings.NewReader(input)) // Set the split function for the scanning operation. scanner.Split(bufio.ScanWords) // Count the words. count := 0 for scanner.Scan() { count++ } if err := scanner.Err(); err != nil { fmt.Fprintln(os.Stderr, "reading input:", err) } fmt.Printf("%d\n", count) }
(4). func NewScanner
func NewScanner(r io.Reader) *Scanner
NewScanner creates and returns a Scanner that reads data from the Scanner. The default segmentation function is ScanLines.
(5). func (*Scanner) Bytes
func (s *Scanner) Bytes() []byte
The Bytes method returns the token generated by the last Scan call. The data pointed to by the underlying array may be overwritten by the next Scan call.
(6). func (*Scanner) Err
func (s *Scanner) Err() error
Err returns the first non EOF error encountered by Scanner.
(7). func (*Scanner) Scan
func (s *Scanner) Scan() bool
The Scan method obtains the token of the current location (which can be obtained through Bytes or Text methods) and moves the scanner's scanning location to the next token. This method returns false when the Scan stops because it reaches the end of the input stream or encounters an error. After the Scan method returns false, the Err method will return any errors encountered during scanning; Unless it's io EOF, Err will return nil. If the split function returns 100 empty tags without advancing the input, it will send a panic. This is a common error in scanner.
(8). func (*Scanner) Split
func (s *Scanner) Split(split SplitFunc)
Split sets the segmentation function of the Scanner. This method must be called before Scan.
(9). func (*Scanner) Text
func (s *Scanner) Text() string
The Bytes method returns the token generated by the last Scan call. It will request to create a string, save the token and return the string.
3.10. type SplitFunc
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
The SplitFunc type represents the segmentation function used to analyze the output lexicon.
The parameter data is a slice of the beginning part of the unprocessed data. The parameter atEOF indicates whether the Reader interface cannot provide more data. The return value is the number of bytes ahead of the parsing position, the token slice to be returned to the caller, and the possible errors. If the data is not enough to (guarantee) generate a complete token, for example, a whole line of data is required, but there is no newline character in the data, SplitFunc can return (0, nil, nil) to tell Scanner to read more data and write to the slice, and then try again with a slice with a longer length starting from the same position (call the SplitFunc type function).
If the return value err is not nil, the scan will terminate and the error will be returned to the caller of the Scanner.
SplitFunc type functions will never be called with empty slice data unless atEOF is true. However, if atEOF is true, data may be non empty and contain unprocessed text.
3.11. type Writer
type Writer struct { // contains filtered or unexported fields }
Writer implements io The cache of the writer object. If an error occurs when writing data to the writer, no more data will be written in, and all subsequent write operations will return error. When all data is written, the client should call the Flush method to ensure that all data has been converted to basic io Writer
Example
package main import ( "bufio" "fmt" "os" ) func main() { w := bufio.NewWriter(os.Stdout) fmt.Fprint(w, "Hello, ") fmt.Fprint(w, "world!") w.Flush() // Don't forget to flush! }
(1). func NewWriter
func NewWriter(w io.Writer) *Writer
NewWriter returns a new Writer with default size cache.
(2). func NewWriterSize
func NewWriterSize(w io.Writer, size int) *Writer
If the specified parameter NewWriterSize is greater than the specified parameter NewWriterSize, a new parameter will be returned. If io The Writer parameter is already large enough to have a cached Writer, and the function will return its underlying Writer.
(3). func (*Writer) Available
func (b *Writer) Available() int
Available returns the number of unused bytes in the buffer.
(4). func (*Writer) Buffered
func (b *Writer) Buffered() int
Buffered returns the number of bytes that have been written to the current cache.
(5). func (*Writer) Flush
func (b *Writer) Flush() error
Flush writes all data on the cache to the underlying io In writer.
(6). func (*Writer) ReadFrom
func (b *Writer) ReadFrom(r io.Reader) (n int64, err error)
ReadFrom implements io ReaderFrom.
(7). func (*Writer) Reset
func (b *Writer) Reset(w io.Writer)
Reset discards any buffered data that is not flushed, clears any errors, and resets b to write its output to w.
(8). func (*Writer) Write
func (b *Writer) Write(p []byte) (nn int, err error)
The Writer writes the contents of p to the cache. It returns the number of bytes written. If NN < len §, it will also return an error to explain why there is a shortage of written data.
(9). func (*Writer) WriteByte
func (b *Writer) WriteByte(c byte) error
WriterByte writes a single byte.
(10). func (*Writer) WriteRune
func (b *Writer) WriteRune(r rune) (size int, err error)
Writerun writes a single Unicode code, returns the number of bytes written, and the error encountered.
(11). func (*Writer) WriteString
func (b *Writer) WriteString(s string) (int, error)
WriteString write a string. It returns the number of bytes written. If the number of bytes is less than len(s), it will return error to explain why there is a shortage of written data.