How to Use set in Go

Posted by jdubwelch on Thu, 01 Aug 2019 04:23:53 +0200

Today let's talk about how Go uses set. This article will cover set and bitset data structures.

The Data Structure of Go

There are not many built-in data structures in Go. In our work, the two most commonly used data structures are slice and map, namely slicing and mapping. In fact, there are arrays in Go. The bottom of slices is arrays, but because of the existence of slices, we seldom use them.

In addition to the built-in data structure of Go, some data structures are provided by the official container package of Go, such as heap heap, list bidirectional list and ring loopback list. But today we won't talk about them, these data structures, for the familiar hand, look at the document will be used.

We're going to talk about set and bitset today. As far as I know, other languages, such as Java, have these two data structures. But Go has not yet provided it in any form.

Ideas for Realization

Let's start with an article to access the address. 2 basic set implementations Read. In this paper, two kinds of go implementation ideas of set are introduced, which are map and bitset.

Interested in reading this article, let's introduce it in detail next.

map

We know that map keys are definitely unique, and this is exactly in line with the characteristics of set, which naturally guarantees the uniqueness of the members of set. And set is implemented by map. When checking whether an element exists, the grammar of, ok: = m [key] can be used directly, which is efficient.

Let's start with a simple implementation, as follows:

set := make(map[string]bool) // New empty set
set["Foo"] = true            // Add
for k := range set {         // Loop
    fmt.Println(k)
}
delete(set, "Foo")    // Delete
size := len(set)      // Size
exists := set["Foo"]  // Membership

It's easier to understand how to store a string collection by creating a map[string]bool. But there's another problem here. The value of map is Boolean, which causes set to occupy more memory space, and set shouldn't have this problem.

How to solve this problem?

Set value to empty structure. In Go, empty structure does not occupy any memory. Of course, if you are not sure, you can also prove this conclusion.

unsafe.Sizeof(struct{}{}) // The result was 0.

The optimized code is as follows:

type void struct{}
var member void

set := make(map[string]void) // New empty set
set["Foo"] = member          // Add
for k := range set {         // Loop
    fmt.Println(k)
}
delete(set, "Foo")      // Delete
size := len(set)        // Size
_, exists := set["Foo"] // Membership

Previously, I saw some people encapsulate it and write it on the Internet. An article You can read it.

In fact, github already has a mature package called golang-set, which is also implemented with this idea. Access address golang-set Docker uses it in the description. The package provides two set implementations, thread-safe set and non-thread-safe set.

Demonstrate a simple case.

package main

import (
    "fmt"

    mapset "github.com/deckarep/golang-set"
)

func main() {
    // Thread-safe created by default, if thread-safe is not required
    // You can use the NewThreadUnsafe Set to create it in the same way.
    s1 := mapset.NewSet(1, 2, 3, 4)  
    fmt.Println("s1 contains 3: ", s1.Contains(3))
    fmt.Println("s1 contains 5: ", s1.Contains(5))

    // The interface parameter, which can pass any type
    s1.Add("poloxue")
    fmt.Println("s1 contains poloxue: ", s1.Contains("poloxue"))
    s1.Remove(3)
    fmt.Println("s1 contains 3: ", s1.Contains(3))

    s2 := mapset.NewSet(1, 3, 4, 5)

    // Union
    fmt.Println(s1.Union(s2))
}

The output is as follows:

s1 contains 3:  true
s1 contains 5:  false
s1 contains poloxue:  true
s1 contains 3:  false
Set{4, polxue, 1, 2, 3, 5}

The example demonstrates a simple way to use, if you do not understand, look at the source code, these data structure operation method names are very common, such as intersection of Intersect, Difference sets, etc., to understand at a glance.

bitset

Continue to talk about bitset. Each number in BitSet can be represented by one bit. For an int8 number, we can use it to represent eight numbers, which can help us greatly save the storage space of data.

The most common applications of bitset are bitmap and flag, which are bitmaps and flags. Here, let's first try to use it to represent the tokens of some operations. For example, in a scenario, we need three flags to represent permission 1, permission 2 and permission 3, and several permissions can coexist. We can express bit Mask with three constants F1, F2 and F3 respectively.

The sample code is as follows (quoted from the article) Bitmasks, bitsets and flags):

type Bits uint8

const (
    F0 Bits = 1 << iota
    F1
    F2
)

func Set(b, flag Bits) Bits    { return b | flag }
func Clear(b, flag Bits) Bits  { return b &^ flag }
func Toggle(b, flag Bits) Bits { return b ^ flag }
func Has(b, flag Bits) bool    { return b&flag != 0 }

func main() {
    var b Bits
    b = Set(b, F0)
    b = Toggle(b, F2)
    for i, flag := range []Bits{F0, F1, F2} {
        fmt.Println(i, Has(b, flag))
    }
}

In the example, we would have needed three numbers to represent the three symbols, but now we can use a uint 8. Some operations of bitset, such as setting up Set, clearing Clear, switching Toggle, checking Has, can be implemented by bit operation, and they are very efficient.

Bit set has a natural advantage over set operation, which can be implemented directly through bit operators. For example, intersection, union, sum and difference sets, the examples are as follows:

  • Intersection: A & B
  • Union: a | b
  • Difference set: A & (~b)

The underlying languages, libraries, and frameworks often use this approach to set flags.

In the above example, only a small amount of data is processed. uint8 takes up 8 bit s of space and can only represent 8 digits. Can you use this idea for big data scenarios?

We can combine bitset with slices in Go to redefine the Bits type as follows:

type Bitset struct {
    data []int64
}

But this also raises some questions. Setting bit, how do we know where it is? Think carefully, this location information contains two parts, that is, where the number of bits is stored in the slice index and which bits are in the number, named index and position respectively. How do I get it?

Index can be obtained by dividing. For example, we want to know which index of 65 bits in the slice can be obtained by 65/64. If we want to be efficient, we can also use bit operation. That is, displacement substitution division, such as 65 > 6, 6 means displacement offset, i.e. 2^n = 64 n.

postion is the remainder of division, which can be obtained by modular operations, such as 65% 64 = 1. Similarly, for efficiency, there are corresponding bit operations, such as 65 & 0b00111111111111, or 65 & 63.

A simple example is as follows:

package main

import (
    "fmt"
)

const (
    shift = 6
    mask  = 0x3f // That is 0b00111111111
)

type Bitset struct {
    data []int64
}

func NewBitSet(n int) *Bitset {
    // Getting Location Information
    index := n >> shift

    set := &Bitset{
        data: make([]int64, index+1),
    }

    // Setting bitset according to n
    set.data[index] |= 1 << uint(n&mask)

    return set
}

func (set *Bitset) Contains(n int) bool {
    // Getting Location Information
    index := n >> shift
    return set.data[index]&(1<<uint(n&mask)) != 0
}

func main() {
    set := NewBitSet(65)
    fmt.Println("set contains 65", set.Contains(65))
    fmt.Println("set contains 64", set.Contains(64))
}

Output results

set contains 65 true
set contains 64 false

The function of the above example is very simple, just for demonstration. There are only two functions: creating bitset and contains. Other functions such as adding, deleting, intersection, Union and difference between different bitsets have not been realized. Interested friends can continue to try.

In fact, bitset package has been implemented, github address bit . You can read its source code, and the idea of implementation is similar to that described above.

Here is a use case.

package main

import (
    "fmt"

    "github.com/yourbasic/bit"
)

func main() {
    s := bit.New(2, 3, 4, 65, 128)
    fmt.Println("s contains 65", s.Contains(65))
    fmt.Println("s contains 15", s.Contains(15))

    s.Add(15)
    fmt.Println("s contains 15", s.Contains(15))

    fmt.Println("next 20 is ", s.Next(20))
    fmt.Println("prev 20 is ", s.Prev(20))

    s2 := bit.New(10, 22, 30)

    s3 := s.Or(s2)
    fmt.Println("next 20 is ", s3.Next(20))

    s3.Visit(func(n int) bool {
        fmt.Println(n)
        return false  // Returning true indicates termination of traversal
    })
}

Implementation results:

s contains 65 true
s contains 15 false
s contains 15 true
next 20 is 65
prev 20 is 15
next 20 is 22
2
3
4
10
15
22
30
65
128

The meaning of the code is well understood, that is, some operations of adding, deleting, modifying, checking and collecting. It should be noted that the difference between bitset and the previous set is that the members of bitset can only be int integers, not set flexible. Usually, the use of scenarios is also relatively small, mainly in high efficiency and storage space requirements scenarios.

summary

This paper introduces the implementation principles of two sets in Go, and on this basis, introduces the simple use of two packages corresponding to them. I think, through this article, the use of set in Go can be basically done.

In addition to these two packages, add two more. zoumo/goset and github.com/willf/bitset.

Topics: Go github Java Docker Big Data