Go language from introduction to specification - 6.3, go language io operation - io related package and bufio

Posted by atlanta on Fri, 04 Mar 2022 01:05:21 +0100

Go language from introduction to specification - 6.3, go language io operation - io related package and bufio

1. Preface

*** (I/O is not much to say here. A simple understanding is the input / output of data. Generally, we need to understand I/O before learning file operation and network programming, because files and networks are related to data I/O. we have contacted the basic interface of file operation in the os package before. Here we will briefly understand and summarize I/O to facilitate the understanding of common file operations How to combine with I/O interface, flexibly read and write files, and pave the way for subsequent learning of net/http/rpc and other package related contents)

2. io,ioutil,bufio

Official website standard package address:

io package provides a basic interface for I/O primitives It mainly encapsulates the existing implementations of these primitives, such as those in the os package, abstracted into functional shared public interfaces, plus some other related primitives.

Since these interfaces and primitives wrap low-level operations in different implementations, customers should not assume that they are safe for parallel execution unless otherwise notified.

Therefore, we rarely use io packages directly, but only provide basic interfaces to facilitate our expansion, that is, we can also use io to realize ioutil, bufio and other similar functions.

Ioutil implements some I/O tool functions. Without special business needs, ioutil is actually quite useful, simple and rough.

bufio package realizes I/O operation with cache It encapsulates an IO Reader or io The Writer object creates another object (reader or Writer), which also implements an interface and provides help for buffering and document reading and writing.

3. bufio

3.1. constant

const (
    // MaxScanTokenSize is the maximum size used to buffer a token.
    // The actual maximum token size may be smaller as the buffer
    // may need to include, for instance, a newline.
    MaxScanTokenSize = 64 * 1024
)

3.2. variable

var (
    ErrInvalidUnreadByte = errors.New("bufio: invalid use of UnreadByte")
    ErrInvalidUnreadRune = errors.New("bufio: invalid use of UnreadRune")
    ErrBufferFull        = errors.New("bufio: buffer full")
    ErrNegativeCount     = errors.New("bufio: negative count")
)
var (
    ErrTooLong         = errors.New("bufio.Scanner: token too long")
    ErrNegativeAdvance = errors.New("bufio.Scanner: SplitFunc returns negative advance count")
    ErrAdvanceTooFar   = errors.New("bufio.Scanner: SplitFunc returns advance count beyond input")
)

Errors returned by Scanner.

3.3. func ScanBytes

func ScanBytes(data []byte, atEOF bool) (advance int, token []byte, err error)

ScanBytes is a segmentation function for Scanner type (conforming to SplitFunc). This function will return each byte as a token.

3.4. func ScanLines

func ScanLines(data []byte, atEOF bool) (advance int, token []byte, err error)

ScanLines is a segmentation function for Scanner type (conforming to SplitFunc). This function will remove the newline mark at the end of each line of text and return it as a token. The returned line can be an empty string. Newline is marked as an optional carriage return followed by a required newline character. Even if there is no token, it will return as the last line feed.

3.5. func ScanRunes

func ScanRunes(data []byte, atEOF bool) (advance int, token []byte, err error)

ScanRunes is a segmentation function for Scanner type (conforming to SplitFunc). This function will return the unicode code value of each utf-8 encoding as a token. The rune sequence returned by this function is the same as the output Rune sequence of a string of range. The wrong utf-8 code will be translated as U+FFFD = "\ xef\xbf\xbd", but only one byte will be consumed. The caller cannot distinguish between correctly encoded runes and incorrectly encoded runes.

3.6. func ScanWords

func ScanWords(data []byte, atEOF bool) (advance int, token []byte, err error)

ScanWords is a segmentation function for Scanner type (conforming to SplitFunc). This function will remove the newline mark at the end of each line of text and return it as a token. The returned line can be an empty string. Newline is marked as an optional carriage return followed by a required newline character. The last line will be returned as a token even if there is no newline character.

3.7. type ReadWriter

type ReadWriter struct {
    *Reader
    *Writer
}

Readwriter stores input and output pointers. It implements io ReadWriter.

(1). func NewReadWriter

func NewReadWriter(r *Reader, w *Writer) *ReadWriter

NewReadWriter assigns a new ReadWriter to schedule r and w.

3.8. type Reader

type Reader struct {
    // contains filtered or unexported fields
}

Reader implements an IO Buffered read of the reader object.

(1). func NewReader

func NewReader(rd io.Reader) *Reader

NewReader returns a new Reader whose size is the default size.

(2).func NewReaderSize

func NewReaderSize(rd io.Reader, size int) *Reader

NewReaderSize returns a new Reader whose cache size is at least larger than the specified size. If io The Reader parameter is already a Reader with a large enough cache, and it will return this Reader.

(3). func (*Reader) Buffered

func (b *Reader) Buffered() int

Buffered returns the number of readable bytes currently cached.

(4). func (*Reader) Discard

func (b *Reader) Discard(n int) (discarded int, err error)

Discard skips the last n bytes and returns the number of bytes discarded.

If Discard skips less than n bytes, it will also return an error. If 0 < = n < = B.B buffered(), Discard guarantees that it will not be transferred from the underlying io Read from reader.

(5). func (*Reader) Peek

func (b *Reader) Peek(n int) ([]byte, error)

Peek returns the next N bytes not read. Bytes are not visible until the next read call. If peek returns fewer bytes than N, it will certainly explain why the number of bytes read is too small. If n is larger than b, the error returned is ErrBufferFull.

(6). func (*Reader) Read

func (b *Reader) Read(p []byte) (n int, err error)

Read reads data to p. Returns the number of bytes read to p. The underlying read will only call read once at most, so n will be less than len §. After EOF, this function returns 0 and io.. Eof.

(7). func (*Reader) ReadByte

func (b *Reader) ReadByte() (c byte, err error)

ReadByte reads and replies to a single byte. If there are no bytes to read, an error is returned.

(8). func (*Reader) ReadBytes

func (b *Reader) ReadBytes(delim byte) (line []byte, err error)

When ReadBytes reads the input and the first terminator occurs, the returned slice contains the contents from the current to the Terminator (including the terminator). If ReadBytes catches an error before encountering the terminator, it will return the data read before encountering the error and the captured error (often io.EOF). When the returned data does not end with a terminator, ReadBytes returns err= nil. For simple use, perhaps Scanner is more convenient.

(9). func (*Reader) ReadLine

func (b *Reader) ReadLine() (line []byte, isPrefix bool, err error)

ReadLine is an underlying raw read command. Many callers may use ReadBytes('\ n') or ReadString('\ n') instead of this method.

ReadLine attempts to return a single line, excluding the last delimiter at the end of the line. If a row is larger than the cache and ifPrefix is returned when calling, the header of the row will be returned. The rest of the line will be returned at the next call. When calling the rest of the line, isPrefix will be set to false, and the returned cache can only be seen the next time readLine is called. ReadLine will return a non empty line or an error, but not both.

The text returned by readLine will not contain the end of the line ("\ r\n" or "\ n"). If the input has no final line end, no sign or error will be returned. Calling UnreadByte after ReadLine will always be put back to the last byte (which may belong to the end of the line), even if the byte is not part of the row returned by ReadLine.

(10). func (*Reader) ReadRune

func (b *Reader) ReadRune() (r rune, size int, err error)

ReadRune reads a single UTF-8 encoded Unicode byte and returns Rune and its byte size. If the encoded rune is visible, it consumes one byte and returns one byte of Unicode ReplacementChar (U+FFFD).

(11). func (*Reader) ReadSlice

func (b *Reader) ReadSlice(delim byte) (line []byte, err error)

ReadSlice reads from the input until the first terminator is encountered, and returns a slice pointing to the bytes in the cache. At the next call, these bytes have been read. If ReadSlice encounters an error before finding the terminator, it will return all the data in the cache and the error itself (often io.EOF). If the cache is full before the terminator, ReadSlice will return ErrBufferFull error. Since the data returned by ReadSlice will be overwritten by the next I/O operation, many clients will choose to use ReadBytes or ReadString instead. ReadSlice returns err if and only if the data does not end with a Terminator= nil

(12). func (*Reader) ReadString

func (b *Reader) ReadString(delim byte) (line string, err error)

ReadString reads the input. When the first terminator occurs, the returned string contains the content from the current to the Terminator (including the terminator). If ReadString catches an error before encountering the terminator, it will return the data read before encountering the error and the captured error (often io.EOF). When the returned data does not end with a terminator, ReadString returns err= nil. For simple use, perhaps Scanner is more convenient.

(13). func (*Reader) Reset

func (b *Reader) Reset(r io.Reader)

Reset discards the data in the buffer, clears any errors, resets b to its lower layer, and reads data from r.

(14). func (*Reader) UnreadByte

func (b *Reader) UnreadByte() error

UnreadByte marks the last byte as unread. Only the last byte can be marked as unread.

(15). func (*Reader) UnreadRune

func (b *Reader) UnreadRune() error

Unread run sets the last run to unread. If the latest operation on the buffer is not readrun, unrearun will return an error. (from this point of view, this function is more strict than UnreadByte, which sets the last byte read as unread.)

(16). func (*Reader) WriteTo

func (b *Reader) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements io WriterTo.

3.9. type Scanner

type Scanner struct {
    // contains filtered or unexported fields
}

The Scanner type provides a convenient interface for reading data, such as reading each line from the text separated by a newline character.

The successfully called Scan method will gradually provide the token of the file and skip the bytes between the tokens. The token is specified by the split function of SplitFunc type; The default split function will split the input into multiple lines and remove the newline flag at the end of the line. The predefined segmentation function in this package can divide the file into lines, bytes, unicode code values and blank separated word s. Callers can customize their own segmentation functions.

The scan will stop irrecoverably when it reaches the end of the input stream, the first I/O error encountered, and the token is too large to be saved into the buffer. When the scanning stops, the current read position may be far behind the last token obtained. For programs that need more control over error management, or have a large token, or must be scanned continuously from the reader, bufio should be used Reader instead.

(1). Example (Custom)

Use scanner with a custom split function (built by wrapping ScanWords) to validate 32-bit decimal input.

code:

// An artificial input source.
const input = "1234 5678 1234567901234567890"
scanner := bufio.NewScanner(strings.NewReader(input))
// Create a custom split function by wrapping the existing ScanWords function.
split := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
    advance, token, err = bufio.ScanWords(data, atEOF)
    if err == nil && token != nil {
        _, err = strconv.ParseInt(string(token), 10, 32)
    }
    return
}
// Set the split function for the scanning operation.
scanner.Split(split)
// Validate the input
for scanner.Scan() {
    fmt.Printf("%s\n", scanner.Text())
}

if err := scanner.Err(); err != nil {
    fmt.Printf("Invalid input: %s", err)
}

Output:

1234
5678
Invalid input: strconv.ParseInt: parsing "1234567901234567890": value out of range

(2). Example (Lines)

The simplest use of Scanner is to read standard input as a set of rows.

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    scanner := bufio.NewScanner(os.Stdin)
    for scanner.Scan() {
        fmt.Println(scanner.Text()) // Println will add back the final '\n'
    }
    if err := scanner.Err(); err != nil {
        fmt.Fprintln(os.Stderr, "reading standard input:", err)
    }
}

(3). Examples (Words)

By scanning the input in the form of space separated symbol sequence, a simple word counting utility is realized by using Scanner program.

package main

import (
    "bufio"
    "fmt"
    "os"
    "strings"
)

func main() {
    // An artificial input source.
    const input = "Now is the winter of our discontent,\nMade glorious summer by this sun of York.\n"
    scanner := bufio.NewScanner(strings.NewReader(input))
    // Set the split function for the scanning operation.
    scanner.Split(bufio.ScanWords)
    // Count the words.
    count := 0
    for scanner.Scan() {
        count++
    }
    if err := scanner.Err(); err != nil {
        fmt.Fprintln(os.Stderr, "reading input:", err)
    }
    fmt.Printf("%d\n", count)
}

(4). func NewScanner

func NewScanner(r io.Reader) *Scanner

NewScanner creates and returns a Scanner that reads data from the Scanner. The default segmentation function is ScanLines.

(5). func (*Scanner) Bytes

func (s *Scanner) Bytes() []byte

The Bytes method returns the token generated by the last Scan call. The data pointed to by the underlying array may be overwritten by the next Scan call.

(6). func (*Scanner) Err

func (s *Scanner) Err() error

Err returns the first non EOF error encountered by Scanner.

(7). func (*Scanner) Scan

func (s *Scanner) Scan() bool

The Scan method obtains the token of the current location (which can be obtained through Bytes or Text methods) and moves the scanner's scanning location to the next token. This method returns false when the Scan stops because it reaches the end of the input stream or encounters an error. After the Scan method returns false, the Err method will return any errors encountered during scanning; Unless it's io EOF, Err will return nil. If the split function returns 100 empty tags without advancing the input, it will send a panic. This is a common error in scanner.

(8). func (*Scanner) Split

func (s *Scanner) Split(split SplitFunc)

Split sets the segmentation function of the Scanner. This method must be called before Scan.

(9). func (*Scanner) Text

func (s *Scanner) Text() string

The Bytes method returns the token generated by the last Scan call. It will request to create a string, save the token and return the string.

3.10. type SplitFunc

type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)

The SplitFunc type represents the segmentation function used to analyze the output lexicon.

The parameter data is a slice of the beginning part of the unprocessed data. The parameter atEOF indicates whether the Reader interface cannot provide more data. The return value is the number of bytes ahead of the parsing position, the token slice to be returned to the caller, and the possible errors. If the data is not enough to (guarantee) generate a complete token, for example, a whole line of data is required, but there is no newline character in the data, SplitFunc can return (0, nil, nil) to tell Scanner to read more data and write to the slice, and then try again with a slice with a longer length starting from the same position (call the SplitFunc type function).

If the return value err is not nil, the scan will terminate and the error will be returned to the caller of the Scanner.

SplitFunc type functions will never be called with empty slice data unless atEOF is true. However, if atEOF is true, data may be non empty and contain unprocessed text.

3.11. type Writer

type Writer struct {
    // contains filtered or unexported fields
}

Writer implements io The cache of the writer object. If an error occurs when writing data to the writer, no more data will be written in, and all subsequent write operations will return error. When all data is written, the client should call the Flush method to ensure that all data has been converted to basic io Writer

Example

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    w := bufio.NewWriter(os.Stdout)
    fmt.Fprint(w, "Hello, ")
    fmt.Fprint(w, "world!")
    w.Flush() // Don't forget to flush!
}

(1). func NewWriter

func NewWriter(w io.Writer) *Writer

NewWriter returns a new Writer with default size cache.

(2). func NewWriterSize

func NewWriterSize(w io.Writer, size int) *Writer

If the specified parameter NewWriterSize is greater than the specified parameter NewWriterSize, a new parameter will be returned. If io The Writer parameter is already large enough to have a cached Writer, and the function will return its underlying Writer.

(3). func (*Writer) Available

func (b *Writer) Available() int

Available returns the number of unused bytes in the buffer.

(4). func (*Writer) Buffered

func (b *Writer) Buffered() int

Buffered returns the number of bytes that have been written to the current cache.

(5). func (*Writer) Flush

func (b *Writer) Flush() error

Flush writes all data on the cache to the underlying io In writer.

(6). func (*Writer) ReadFrom

func (b *Writer) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom implements io ReaderFrom.

(7). func (*Writer) Reset

func (b *Writer) Reset(w io.Writer)

Reset discards any buffered data that is not flushed, clears any errors, and resets b to write its output to w.

(8). func (*Writer) Write

func (b *Writer) Write(p []byte) (nn int, err error)

The Writer writes the contents of p to the cache. It returns the number of bytes written. If NN < len §, it will also return an error to explain why there is a shortage of written data.

(9). func (*Writer) WriteByte

func (b *Writer) WriteByte(c byte) error

WriterByte writes a single byte.

(10). func (*Writer) WriteRune

func (b *Writer) WriteRune(r rune) (size int, err error)

Writerun writes a single Unicode code, returns the number of bytes written, and the error encountered.

(11). func (*Writer) WriteString

func (b *Writer) WriteString(s string) (int, error)

WriteString write a string. It returns the number of bytes written. If the number of bytes is less than len(s), it will return error to explain why there is a shortage of written data.

3.12. Package file

bufio.go

scan.go

Topics: Go