[go sliceheader: how slice handles data efficiently]

Posted by phpprog on Wed, 22 Dec 2021 18:33:56 +0100

Go set type In this article, we learned slice and how to use it. Next, we introduce the principle of slice and learn the underlying design.

Array array

Before talking about slice, let's introduce arrays. Arrays exist in almost all programming languages, and Go is no exception. So why does Go language design slice in addition to array? Let's first look at the limitations of arrays.

As we know, an array consists of two parts: the size of the array and the type of elements in the array. Once an array is declared, its size and internal element type cannot be changed, and any number of elements cannot be added to the array at will. This is the first limitation of the array.

// Array structure pseudo code
array {
    len
    item type
}

// [1]string and [2]string, which are two types, because the size of the array is also a part of the array type. The two arrays are of the same type only when the internal element type and size of the array are the same.

Since the size of the array is fixed, if you need to use the array to store a large amount of data, you need to specify an appropriate size in advance, such as 10W. Although this can solve the problem, it brings another problem, that is, memory occupation. Because in Go language, the parameters between functions are passed by value. When the array is passed between functions as a parameter, the same content will be copied over and over again, which will cause a lot of memory waste. This is the second limitation of the array. Although the array has limitations, it is a very important underlying data structure of Go language. For example, the underlying data of slice slice is stored in the array.

Slice slice

Array is good, but there are many restrictions on operation. In order to solve these restrictions, Go language creates slice, that is, slice. Slicing is the abstraction and encapsulation of an array. Its bottom layer is an array that stores all elements, but it can dynamically add elements and automatically expand the capacity when the capacity is insufficient. Slicing can be understood as a dynamic array. In Go language, slicing is used in most cases, except that the types with specified length and size need to be completed by array.

Dynamic capacity expansion

With the built-in append method, you can append any number of elements to a slice, so you can solve the first limitation of the array. When adding elements through the append function, append will automatically expand the capacity if the capacity of the slice is insufficient. You can obtain the length of the slice through the built-in len function and the capacity of the slice through the cap function.

Tip: the principle of append automatic capacity expansion is to create a new underlying array, copy the elements in the original slice to the new array, and then return a slice pointing to the new array.

Data structure

In Go language, slice is actually a structure, which is defined as follows:

type SliceHeader struct{
    Data uintptr
    Len int 
    Cap int
}

SliceHeader is the representation of slice at run time. It has three fields: Data, Len and Cap. Through these three fields, an array can be abstracted into a slice for better operation. Therefore, the underlying Data corresponding to different slices may point to the same array.

	1. Data Used to point to an array that stores slice elements
	2. Len Represents the length of the slice
	3. Cap Represents the capacity of the slice

func main() {
    a1 := [2]string{"zhangsna", "lisi "}
    s1 := a1[0:1]
    s2 := a1[:]
    // Print the data values of s1 and s2, which are the same. Through unsafe Pointer turns them into * reflect Sliceheader pointer, print data value
    fmt.Println((*reflect.SliceHeader)(unsafe.Pointer(&s1).Data))
    fmt.Println((*reflect.SliceHeader)(unsafe.Pointer(&s2).Data))
}

// Output results
824634159527
824634159527

// From the output results, the two slices share the same array, so the same array is used in slice assignment and slice operation again, and the original elements are not assigned. This can reduce the occupation of memory and improve efficiency.

Note: multiple slices share a bottom array. Although memory consumption can be reduced, if one slice modifies internal elements, other slices will also be affected. Therefore, when the slice is passed between functions as a parameter, be careful not to modify the elements in the original slice as much as possible.

The essence of slicing is SliceHeader, and because the parameter of the function is value passing, the copy of SliceHeader is passed instead of the copy of the underlying array. At this time, the advantage of slicing is reflected, because the copy memory of SliceHeader occupies very little. Even a very large slice (the underlying array consists of many elements) occupies up to 24 bytes of memory, which solves the problem of memory waste in parameter transmission of large arrays.

Note: the types of the three fields of SliceHeader are uintptr, int and int. on 64 bit machines, the maximum number of these three fields is int64. One int64 occupies 8 bytes and three int64 occupy 24 bytes of memory.

To obtain the three field values of the slice data structure, you can completely customize a structure without using SliceHeader, as long as the field is the same as SliceHeader. However, we still use SliceHeader as much as possible, because it is the standard provided by Go language, which can be unified and easy to understand.

type slice struct{
    Data uintptr
    Len int
    Cap int
}
sh1 := (*slcie)(unsafe.Pointer(&s1))
fmt.Println(sh1.Data, sh1.Len, sh1.Cap)

Reasons for high efficiency

From the perspective of collection type, array, slice and map are combined types because they can store elements, but the value and assignment operations of array and slice are more efficient because they are continuous memory operations, and the address of element storage can be quickly found through index. Further contrast, in the array and slice, the slice is efficient, because it does not copy all the elements when assigning values and transferring function parameters, but only assigns three fields of SliceHeader, and the same underlying array is shared.

Tip: of course, map is also of great value, because its Key can be of many types, such as int, int64, string, etc., but the indexes of arrays and slices can only be integers.

func main(){
    a1 := [2]string{"zhangsan", "lisi"}
    fmt.Println("main Function array pointer: %p\n", &a1)
    
    arrayF(a1)
    s1 := a1[0:1]
    fmt.Println((*reflect.SliceHeader)(unsafe.Pointer(&s1).Data))
    sliceF(s1)
}

func arrayF(a [2]string){
    fmt.Println("arrayF Function array pointer:%p\n", &a)
}

func sliceF(s []string) {
    fmt.Println("sliceF function Data: %d\n", (*reflect.SliceHeader)(unsafe.Pointer(&s)).Data)
}


// Output results
main Function array pointer: 0xc0000a9527
rrayF Function array pointer: 0 xc0000a9527
824634400800
sliceF function Data: 824634400800

// We find that the pointer of the same array in the main function is different from that in the arrayF function, which indicates that the array is copied when passing parameters and a new array is generated. The underlying Data of slice slice is the same, which means that the two slices share the same underlying array in mian function or slcieF function, and the underlying array is not copied.

Tip: the efficiency of slicing is also reflected in the for range loop. Because the temporary variable obtained by the loop is also a value copy, the efficiency of slicing is higher when traversing large arrays.

Slice pointer based encapsulation is the fundamental reason for its high efficiency, because it can reduce the occupation of memory and reduce the time consumption of memory replication.

string and [] byte are converted to each other

We further understand the reason why slice is efficient through the example of the conversion of string and [] byte.

s:= "Running snail "
b := []byte(s)
s2 := string(b)
fmt.Println(s, string(b), s2)

// Variable s is a string string, which can be cast to variable b of [] byte type through [] byte, and to variable s2 of string type through string(). Their values are "running snails"

Go language realizes the forced conversion before string and [] byte by allocating a memory first and then copying the content. Now verify that the forced conversion uses the method of reallocating memory through the memory address of the real content pointed to by string and [] byte.

s := "One armed Astro Boy"
fmt.Println("s Memory address: %d\n", (*reflect.StringHeader)(unsafe.Pointer(&s)).Data)
b := []byte(s)
fmt.Println("b Memory address: %d\n", (*reflect.SliceHeader)(unsafe.Pointer(&b)).Data)
s2 := string(b)
fmt.Println("s2 Memory address: %d\n", (*reflect.StringHeader)(unsafe.Pointer(&s2)).Data)

// It is found that the printed memory addresses are different, which means that although the contents are the same, they are not the same string because the memory addresses are different

Tip: you can view the runtime Stringtoslicebyte and runtime Slicebytetostring source code of the two functions. Learn about the specific implementation of the mutual conversion of string and [] byte types.

StringHeader is the same as SliceHeader, which represents the real structure of string in program operation. The definition of StringHeader is as follows:

type StringHeader struct{
    Data uintptr
    Len int
}

// When the program runs, strings and slices are essentially stringheaders and sliceheaders. Both structures have a Data field that holds pointers to real content. Therefore, print out the value of the Data field to determine whether memory has been reallocated after string and [] byte forced conversion.

Now we know that [] byte and string coercion will copy a string again. If the string is very large, due to the large memory overhead, this method cannot be satisfied for programs with high performance, and performance optimization is needed. How to optimize it? Since the memory overhead is caused by memory allocation, the optimization idea should be to realize type conversion without reallocating memory.

We observed as like as two peas of StringHeader and Sliceheader, the first two fields are exactly the same, then []byte to string is equal to unsafe. through two. Pointer converts * Sliceheader to * stringheader, that is, * [] byte to * string. The principle is similar to converting slices into a user-defined structure mentioned above.

s := "Running snail "
b := []byte(s)
// s2 := string(b)
s3 := *(*string)(unsafe.Pointer(&b))

// In the example, the contents of s3 and s2 are the same. The difference is that s4 does not apply for memory (zero copy). It uses the same memory as variable s because their underlying Data fields are the same, which saves memory and achieves the purpose of converting [] byte to string.

SliceHeader has three fields: Data, len and Cap, and StringHeader has two fields: Data and len, so * SliceHeader passes unsafe There is no problem converting pointer to * StringHeader, because * SliceHeader can provide the Data and Len field values required by * StringHeader. However, the reverse is not possible, because the * StringHeader lacks the Cap field required by the * SliceHeader and needs to supplement a default value by itself.

s := "zhangsan"
// b:= []byte(s)
sh := (*reflect.SliceHeader)(unsafe.Pointer(&s))
sh.Cap = sh.Len
b1 := *(*[]byte)(unsafe.Pointer(sh))

Tip: through unsafe After the pointer converts the string to [] byte, it cannot modify [] byte. For example, it cannot perform the operation of b[0] = 12, which will report an exception and cause the program to crash. This is because string memory is read-only in the Go language.

Through unsafe Pointer type conversion to avoid memory copy and improve performance is also used in the Go language standard library, such as string The builder structure has a buf field inside to store the content. When converting the buf of [] byte type to string through the string method, unsafe is used Pointer improves efficiency.

func (b *Builder) String() string{
    return *(*string)(unsafe.Pointer(&b.buf))
}

The conversion of string and [] byte is a good example of using SliceHeader structure. It can realize zero copy type conversion, improve efficiency and avoid memory waste.

summary

Through the analysis of slice slice, we can deeply feel the charm of Go. It encapsulates the underlying pointer and array, and provides a slice concept to developers, which can not only facilitate use, improve development efficiency, but also improve program performance.

The idea of designing slices in Go language is very useful. We can also use uintptr or slcie fields to improve performance, just like data in SliceHeader in Go language The uintptr field is the same.

Topics: Go Back-end

Programmer Think

[go sliceheader: how slice handles data efficiently]

Hot Topics