Thoroughly understand Golang Slice

Posted by if on Sun, 26 Dec 2021 08:03:12 +0100

After reading this article, can you answer the following high-frequency interview questions

  1. Underlying implementation principle of Go slice
  2. The difference between Go array and slice
  3. Go slice deep and light copies
  4. What is the Go slice capacity expansion mechanism?
  5. Why is Go slice non thread safe?

Implementation principle

slice is an array with no fixed length. The underlying structure is a structure, which contains the following three attributes

A slice occupies 24 bytes in golang

type slice struct {
    array unsafe.Pointer 
    len   int 
    cap   int 
}

Array: contains a pointer to an array. The data is actually stored on the array pointed to by the pointer, occupying 8 bytes

len: the length used by the current slice, occupying 8 bytes

cap: the capacity of the current slice and the length of the underlying array, 8 bytes

Slice is not a real dynamic array, but a reference type. Slice always points to an underlying array. Slice declaration can be the same as array, but the length is variable. Through syntax sugar in golang, we can automatically create slice structure like declaring array

When the slice element value is taken according to the index position, the default value range is (0 ~ len(slice)-1). Generally, when slice is output, it usually refers to slice[0:len(slice)-1]. The value pointed to in the underlying array can be output according to the subscript

Main characteristics

reference type

golang has three commonly used advanced types: slice, map and channel. They are all reference types. When the reference type is used as a function parameter, the original content data may be modified.

func sliceModify(s []int) {
    s[0] = 100
}

func sliceAppend(s []int) []int {
    s = append(s, 100)
    return s
}

func sliceAppendPtr(s *[]int) {
    *s = append(*s, 100)
    return
}

// Note: all parameters passed in Go language are value passing (value passing), which is a copy and a copy.
// The copied content is of non reference type (int, string, struct, etc.), and the original content data cannot be modified in the function;
// The copied content is a reference type (interface, pointer, map, slice, chan, etc.), so that the original content data can be modified.
func TestSliceFn(t *testing.T) {
    // The parameter is the reference type slice: the len/cap of the outer slice will not change, and the underlying array pointed to will change
    s := []int{1, 1, 1}
    newS := sliceAppend(s)
    // Capacity expansion occurred in the function
    t.Log(s, len(s), cap(s))
    // [1 1 1] 3 3
    t.Log(newS, len(newS), cap(newS)) 
    // [1 1 1 100] 4 6

    s2 := make([]int, 0, 5)
    newS = sliceAppend(s2)
    // There is no capacity expansion in the function
    t.Log(s2, s2[0:5], len(s2), cap(s2)) 
    // [] [100 0 0 0 0] 0 5
    t.Log(newS, newS[0:5], len(newS), cap(newS))
    // [100] [100 0 0 0 0] 1 5

    // The parameter is the pointer of the reference type slice: the len/cap of the outer slice will change and the underlying array pointed to will change
    sliceAppendPtr(&s)
    t.Log(s, len(s), cap(s)) 
  // [1 1 1 100] 4 6
    sliceModify(s)
    t.Log(s, len(s), cap(s)) 
  // [100 1 1 100] 4 6
}

The official account caspar reply to the code to get all the sample code.

Slice status

Slice has three special states: Zero slice, empty slice and nil slice

func TestSliceEmptyOrNil(t *testing.T) {
    var slice1 []int           
  // slice1 is nil slice
    slice2 := make([]int, 0)    
    // slcie2 is empty slice
    var slice3 = make([]int, 2) 
    // slice3 is zero slice
    if slice1 == nil {
        t.Log("slice1 is nil.") 
        // This line will be output
    }
    if slice2 == nil {
        t.Log("slice2 is nil.") 
        // This line will not be output
    }
    t.Log(slice3) // [0 0]
}

Non thread safe

Slice does not support concurrent reading and writing, so it is not thread safe. Multiple goroutine s are used to operate variables of type slice. The probability of output value will not be the same each time, which is inconsistent with the expected value; Slice will not report errors during concurrent execution, but data will be lost

/**
* Slice non concurrent security
* Execute multiple times and get different results each time
* You can consider using the characteristics of channel itself (blocking) to achieve safe concurrent read and write
 */
func TestSliceConcurrencySafe(t *testing.T) {
    a := make([]int, 0)
    var wg sync.WaitGroup
    for i := 0; i < 10000; i++ {
        wg.Add(1)
        go func(i int) {
            a = append(a, i)
            wg.Done()
        }(i)
    }
    wg.Wait()
    t.Log(len(a)) 
    // not equal 10000
}

There are two ways to implement slice thread safety:

Method 1: realize slice thread safety by locking, which is suitable for scenarios with low performance requirements.

func TestSliceConcurrencySafeByMutex(t *testing.T) {
    var lock sync.Mutex //mutex 
    a := make([]int, 0)
    var wg sync.WaitGroup
    for i := 0; i < 10000; i++ {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            lock.Lock()
            defer lock.Unlock()
            a = append(a, i)
        }(i)
    }
    wg.Wait()
    t.Log(len(a)) 
    // equal 10000
}

Mode 2: slice thread safety is realized through channel, which is suitable for scenarios with high performance requirements.

func TestSliceConcurrencySafeByChanel(t *testing.T) {
    buffer := make(chan int)
    a := make([]int, 0)
    // consumer
    go func() {
        for v := range buffer {
            a = append(a, v)
        }
    }()
    // producer
    var wg sync.WaitGroup
    for i := 0; i < 10000; i++ {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            buffer <- i
        }(i)
    }
    wg.Wait()
    t.Log(len(a)) 
    // equal 10000
}

Shared storage

If multiple slices share the same underlying array, changes to one slice or the underlying array will affect other slices

/**
* Slice shared storage
 */
func TestSliceShareMemory(t *testing.T) {
    slice1 := []string{"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"}
    Q2 := slice1[3:6]
    t.Log(Q2, len(Q2), cap(Q2)) 
    // [4 5 6] 3 9
    Q3 := slice1[5:8]
    t.Log(Q3, len(Q3), cap(Q3)) 
    // [6 7 8] 3 7
    Q3[0] = "Unkown"
    t.Log(Q2, Q3) 
    // [4 5 Unkown] [Unkown 7 8]

    a := []int{1, 2, 3, 4, 5}
    shadow := a[1:3]
    t.Log(shadow, a)             
    // [2 3] [1 2 3 4 5]
    shadow = append(shadow, 100) 
    // All slices pointing to the array are modified
    t.Log(shadow, a)            
  // [2 3 100] [1 2 3 100 5]
}

Common operation

establish

slice can be created in four ways, as follows:

func TestSliceInit(t *testing.T) {
    // Initialization method 1: direct declaration
    var slice1 []int
    t.Log(len(slice1), cap(slice1)) 
    // 0, 0
    slice1 = append(slice1, 1)
    t.Log(len(slice1), cap(slice1)) 
    // 1, 1, 24

    // Initialization method 2: use literal
    slice2 := []int{1, 2, 3, 4}
    t.Log(len(slice2), cap(slice2)) 
    // 4, 4, 24

    // Initialization method 3: create slice with make
    slice3 := make([]int, 3, 5)           
  // Make ([] t, len, cap) if cap is not transmitted, it is the same as len
    t.Log(len(slice3), cap(slice3))       
  // 3, 5
    t.Log(slice3[0], slice3[1], slice3[2]) 
    // 0, 0, 0
    // t.Log(slice3[3], slice3[4]) 
    // panic: runtime error: index out of range [3] with length 3
    slice3 = append(slice3, 1)
    t.Log(len(slice3), cap(slice3)) 
    // 4, 5, 24

    // Initialization method 4: intercept from slice or array
    arr := [100]int{}
    for i := range arr {
        arr[i] = i
    }
    slcie4 := arr[1:3]
    slice5 := make([]int, len(slcie4))
    copy(slice5, slcie4)
    t.Log(len(slcie4), cap(slcie4), unsafe.Sizeof(slcie4)) 
    // 2,99,24
    t.Log(len(slice5), cap(slice5), unsafe.Sizeof(slice5)) 
    // 2,2,24
}

increase

func TestSliceGrowing(t *testing.T) {
    slice1 := []int{}
    for i := 0; i < 10; i++ {
        slice1 = append(slice1, i)
        t.Log(len(slice1), cap(slice1))
    }
    // 1 1
    // 2 2
    // 3 4
    // 4 4
    // 5 8
    // 6 8
    // 7 8
    // 8 8
    // 9 16
    // 10 16
}

delete

func TestSliceDelete(t *testing.T) {
    slice1 := []int{1, 2, 3, 4, 5}
    var x int
    // Delete last element
    x, slice1 = slice1[len(slice1)-1], slice1[:len(slice1)-1] 
    t.Log(x, slice1, len(slice1), cap(slice1)) 
    // 5 [1 2 3 4] 4 5

    // Delete the 2nd element    
    slice1 = append(slice1[:2], slice1[3:]...) 
    t.Log(slice1, len(slice1), cap(slice1))    
    // [1 2 4] 3 5
}

lookup

v := s[i] // Subscript access

modify

s[i] = 5 // Subscript modification

intercept

/**
* Slice interception
 */
func TestSliceSubstr(t *testing.T) {
    slice1 := []int{1, 2, 3, 4, 5}
    slice2 := slice1[:]
    // Intercept slice[left:right:max]
    // left: omit the default 0
    // right: omit the default len(slice1)
    // max: omit the default len(slice1)
    // len = right-left+1
    // cap = max-left
    t.Log(slice2, len(slice2), cap(slice2)) 
    // 1 2 3 4 5] 5 5
    slice3 := slice1[1:]
    t.Log(slice3, len(slice3), cap(slice3)) 
    // [2 3 4 5] 4 4
    slice4 := slice1[:2]
    t.Log(slice4, len(slice4), cap(slice4)) 
    // [1 2] 2 5
    slice5 := slice1[1:2]
    t.Log(slice5, len(slice5), cap(slice5)) 
    // [2] 1 4
    slice6 := slice1[:2:5]
    t.Log(slice6, len(slice6), cap(slice6)) 
    // [1 2] 2 5
    slice7 := slice1[1:2:2]
    t.Log(slice7, len(slice7), cap(slice7)) 
    // [2] 1 1
}

ergodic

There are three ways to traverse slices

/**
* Slice traversal
 */
func TestSliceTravel(t *testing.T) {
    slice1 := []int{1, 2, 3, 4}
    for i := 0; i < len(slice1); i++ {
        t.Log(slice1[i])
    }
    for idx, e := range slice1 {
        t.Log(idx, e)
    }
    for _, e := range slice1 {
        t.Log(e)
    }
}

reversal

func TestSliceReverse(t *testing.T) {
    a := []int{1, 2, 3, 4, 5}
    for left, right := 0, len(a)-1; left < right; left, right = left+1, right-1 {
        a[left], a[right] = a[right], a[left]
    }
    t.Log(a, len(a), cap(a)) 
    // [5 4 3 2 1] 5 5
}

Copy

During development, one variable will often be copied to another variable. This process may be a deep and shallow copy. Today, let's help you distinguish the difference between the two copies and the specific difference

Deep copy

The data itself is copied to create a new object. The newly created object does not share memory with the original object. The newly created object opens up a new memory address in memory. The modification of the new object value will not affect the original object value. Since the memory addresses are different, they can be released separately when releasing the memory address

For data of value type, the default assignment operation is deep copy, such as Array, Int, String, Struct, Float and Bool. If you want to implement deep copy of reference type data, you need to complete it through auxiliary functions

For example, the golang deep copy copy method will copy the elements in the source slice value (i.e. from Slice) to the target slice (i.e. to Slice), and return the number of copied elements. The two types of copy must be consistent. The final copy result of copy method depends on the shorter slice. When the shorter slice is copied, the whole copy process is completed

/**
* Deep copy
 */
func TestSliceDeepCopy(t *testing.T) {
    slice1 := []int{1, 2, 3, 4, 5}
    slice2 := make([]int, 5, 5)
    // Deep copy
    copy(slice2, slice1)                   
    t.Log(slice1, len(slice1), cap(slice1)) 
    // [1 2 3 4 5] 5 5
    t.Log(slice2, len(slice2), cap(slice2)) 
    // [1 2 3 4 5] 5 5
    slice1[1] = 100                        
    t.Log(slice1, len(slice1), cap(slice1)) 
    // [1 100 3 4 5] 5 5
    t.Log(slice2, len(slice2), cap(slice2)) 
    // [1 2 3 4 5] 5 5
}

Shallow copy

The data address is copied. Only the pointer to the object is copied. At this time, the memory address pointed to by the new object and the old object is the same. When the value of the new object is modified, the old object will also change. When the memory address is released, the memory address is also released.

All data of reference type are shallow copies by default, such as Slice, Map, etc

The target slice and the source slice point to the same underlying array. Any change in array elements will affect both arrays at the same time.

/**
* Shallow copy
 */
func TestSliceShadowCopy(t *testing.T) {
    slice1 := []int{1, 2, 3, 4, 5}
    // Shallow copy (Note: = shallow copy for reference type and deep copy for value type)
    slice2 := slice1     
    t.Logf("%p", slice1) // 0xc00001c120
    t.Logf("%p", slice2) // 0xc00001c120
    // When two arrays are changed at the same time, it is a shallow copy. When the capacity is not expanded, after modifying the elements of slice1, the elements of slice2 will also be modified
    slice1[0] = 10
    t.Log(slice1, len(slice1), cap(slice1)) 
    // [10 2 3 4 5] 5 5
    t.Log(slice2, len(slice2), cap(slice2)) 
    // [10 2 3 4 5] 5 5
    // Note: after capacity expansion, slice1 and slice2 will no longer point to the same array. After modifying slice1 elements, slice2 elements will not be modified
    slice1 = append(slice1, 5, 6, 7, 8)
    slice1[0] = 11   
  // It can be found that slice1[0] is changed to 11, slice1[0] is still 10
    t.Log(slice1, len(slice1), cap(slice1)) 
    // [11 2 3 4 5 5 6 7 8] 9 10
    t.Log(slice2, len(slice2), cap(slice2))
  // [10 2 3 4 5] 5 5
}

When copying slices, the pointers of arrays in slices are also copied. Before triggering the expansion logic, the two slices point to the same array, and after triggering the expansion logic, they point to different arrays

Capacity expansion

Capacity expansion occurs when slice append is used. When the slice cap is insufficient to accommodate new elements, capacity expansion will occur

Source code: https://github.com/golang/go/...

func growslice(et *_type, old slice, cap int) slice {
      // Omit some judgment
    newcap := old.cap
    doublecap := newcap + newcap
    if cap > doublecap {
        newcap = cap
    } else {
        if old.len < 1024 {
            newcap = doublecap
        } else {
            // Check 0 < newcap to detect overflow
            // and prevent an infinite loop.
            for 0 < newcap && newcap < cap {
                newcap += newcap / 4
            }
            // Set newcap to the requested cap when
            // the newcap calculation overflowed.
            if newcap <= 0 {
                newcap = cap
            }
        }
    }
    // Omit some follow-up
}
  • If the newly applied capacity is twice as large as the original capacity, the capacity after expansion is equal to the newly applied capacity
  • If the original slice length is less than 1024, the capacity will be expanded twice each time
  • If the original slice is greater than or equal to 1024, each expansion will be expanded to 1.25 times the original slice

Memory leak

Because the bottom layer of slice is an array, it is likely that the array is large, but the number of elements taken by slice is very small, which leads to the waste of most of the space occupied by the array

Case1:

For example, in the following code, if the incoming slice b is large, and then a small part is referenced to the global quantity a, the unreferenced part of b (the data after subscript 1) will not be released, resulting in the so-called memory leak.

var a []int

func test(b []int) {
    a = b[:1] // And b share an underlying array
    return
}

Then as long as the global quantity a is, b will not be recycled.

How to avoid?

In such a scenario, note: if we only use a small part of a slice, the entire array at the bottom will continue to be saved in memory. When the underlying array is large or there are many such scenarios, it may cause a sharp increase in memory and crash.

Therefore, in such a scenario, we can copy the required slices to a new slice to reduce the memory occupation

var a []int

func test(b []int) {
    a = make([]int, 1)
    copy(a, b[:0])
    return
}

Case2:

For example, the slice returned by the following code is a small part, so that the original large underlying array cannot be recycled after the function exits

func test2() []int{
    s = make([]int, 0, 10000)
    for i := 0; i < 10000; i++ {
        s = append(s, p)
    }
    s2 := s[100:102]
    return s2
}

How to avoid?

Copy the required slices to a new slice to reduce the memory occupation

func test2() []int{
    s = make([]int, 0, 10000)
    for i := 0; i < 10000; i++ {
      // Some calculations
        s = append(s, p)
    }
    s2 := make([]int, 2)
    copy(s2, s[100:102])
    return s2
}

Slice vs. array

The array is a fixed length. The length must be specified during initialization. If the length is not specified, it is sliced

Array is a value type. When assigning an array to another array, a deep copy is passed. The assignment and function parameter transfer operations will copy the entire array data and occupy additional memory; Slice is a reference type. When assigning a slice to another slice, a shallow copy is passed. Assignment and function parameter transfer operations will only copy len and cap, but the bottom layer shares the same array and will not occupy additional memory.

//A is an array. Note that the array is a fixed length. The length must be specified during initialization. If the length is not specified, it is slicing
a := [3]int{1, 2, 3}
//b is an array, a deep copy of A
b := a
//c is a slice, a reference type, and the underlying array is a
c := a[:]
for i := 0; i < len(a); i++ {
 a[i] = a[i] + 1
}
//After changing the value of a, b is a copy of a, b remains unchanged, c is a reference, and the value of c changes
fmt.Println(a) 
//[2,3,4]
fmt.Println(b) 
//[1 2 3]
fmt.Println(c) 
//[2,3,4]
//A is a slice. If the length is not specified, it is a slice
a := []int{1, 2, 3}
//b is a slice, a copy of A
b := a
//c is a slice and a reference type
c := a[:]
for i := 0; i < len(a); i++ {
 a[i] = a[i] + 1
}
//After changing the value of a, b is the shallow copy of a, the value of b is modified, c is a reference, and the value of c is changed
fmt.Println(a) 
//[2,3,4]
fmt.Println(b) 
//[2,3,4]
fmt.Println(c) 
//[2,3,4]

summary

  • When creating slices, the capacity can be pre allocated according to the actual needs to avoid capacity expansion during the addition process as far as possible, which is conducive to improving performance
  • Using append() to append elements to slices may trigger capacity expansion, and new slices will be generated after capacity expansion
  • When using len() and cap() to calculate the slice length and capacity, the time complexity is O(1), and there is no need to traverse the slice
  • Slicing is non thread safe. If you want to achieve thread safety, you can lock or use Channel
  • When a large array is used as a function parameter, the entire array data will be copied, which consumes too much memory. It is recommended to use slices or pointers
  • When the slice is used as a function parameter, the array pointed to by the slice can be changed, but the slice itself len and cap cannot be changed; To change the slice itself, you can return the changed slice or take the slice pointer as a function parameter.
  • If only a small part of the large slice is used, it is recommended to copy the required slice to a new slice to reduce the memory occupation

This article is composed of blog one article multi posting platform OpenWrite release!

Topics: Go