Today let's talk about how Go uses set. This article will cover set and bitset data structures.
The Data Structure of Go
There are not many built-in data structures in Go. In our work, the two most commonly used data structures are slice and map, namely slicing and mapping. In fact, there are arrays in Go. The bottom of slices is arrays, but because of the existence of slices, we seldom use them.
In addition to the built-in data structure of Go, some data structures are provided by the official container package of Go, such as heap heap, list bidirectional list and ring loopback list. But today we won't talk about them, these data structures, for the familiar hand, look at the document will be used.
We're going to talk about set and bitset today. As far as I know, other languages, such as Java, have these two data structures. But Go has not yet provided it in any form.
Ideas for Realization
Let's start with an article to access the address. 2 basic set implementations Read. In this paper, two kinds of go implementation ideas of set are introduced, which are map and bitset.
Interested in reading this article, let's introduce it in detail next.
map
We know that map keys are definitely unique, and this is exactly in line with the characteristics of set, which naturally guarantees the uniqueness of the members of set. And set is implemented by map. When checking whether an element exists, the grammar of, ok: = m [key] can be used directly, which is efficient.
Let's start with a simple implementation, as follows:
set := make(map[string]bool) // New empty set set["Foo"] = true // Add for k := range set { // Loop fmt.Println(k) } delete(set, "Foo") // Delete size := len(set) // Size exists := set["Foo"] // Membership
It's easier to understand how to store a string collection by creating a map[string]bool. But there's another problem here. The value of map is Boolean, which causes set to occupy more memory space, and set shouldn't have this problem.
How to solve this problem?
Set value to empty structure. In Go, empty structure does not occupy any memory. Of course, if you are not sure, you can also prove this conclusion.
unsafe.Sizeof(struct{}{}) // The result was 0.
The optimized code is as follows:
type void struct{} var member void set := make(map[string]void) // New empty set set["Foo"] = member // Add for k := range set { // Loop fmt.Println(k) } delete(set, "Foo") // Delete size := len(set) // Size _, exists := set["Foo"] // Membership
Previously, I saw some people encapsulate it and write it on the Internet. An article You can read it.
In fact, github already has a mature package called golang-set, which is also implemented with this idea. Access address golang-set Docker uses it in the description. The package provides two set implementations, thread-safe set and non-thread-safe set.
Demonstrate a simple case.
package main import ( "fmt" mapset "github.com/deckarep/golang-set" ) func main() { // Thread-safe created by default, if thread-safe is not required // You can use the NewThreadUnsafe Set to create it in the same way. s1 := mapset.NewSet(1, 2, 3, 4) fmt.Println("s1 contains 3: ", s1.Contains(3)) fmt.Println("s1 contains 5: ", s1.Contains(5)) // The interface parameter, which can pass any type s1.Add("poloxue") fmt.Println("s1 contains poloxue: ", s1.Contains("poloxue")) s1.Remove(3) fmt.Println("s1 contains 3: ", s1.Contains(3)) s2 := mapset.NewSet(1, 3, 4, 5) // Union fmt.Println(s1.Union(s2)) }
The output is as follows:
s1 contains 3: true s1 contains 5: false s1 contains poloxue: true s1 contains 3: false Set{4, polxue, 1, 2, 3, 5}
The example demonstrates a simple way to use, if you do not understand, look at the source code, these data structure operation method names are very common, such as intersection of Intersect, Difference sets, etc., to understand at a glance.
bitset
Continue to talk about bitset. Each number in BitSet can be represented by one bit. For an int8 number, we can use it to represent eight numbers, which can help us greatly save the storage space of data.
The most common applications of bitset are bitmap and flag, which are bitmaps and flags. Here, let's first try to use it to represent the tokens of some operations. For example, in a scenario, we need three flags to represent permission 1, permission 2 and permission 3, and several permissions can coexist. We can express bit Mask with three constants F1, F2 and F3 respectively.
The sample code is as follows (quoted from the article) Bitmasks, bitsets and flags):
type Bits uint8 const ( F0 Bits = 1 << iota F1 F2 ) func Set(b, flag Bits) Bits { return b | flag } func Clear(b, flag Bits) Bits { return b &^ flag } func Toggle(b, flag Bits) Bits { return b ^ flag } func Has(b, flag Bits) bool { return b&flag != 0 } func main() { var b Bits b = Set(b, F0) b = Toggle(b, F2) for i, flag := range []Bits{F0, F1, F2} { fmt.Println(i, Has(b, flag)) } }
In the example, we would have needed three numbers to represent the three symbols, but now we can use a uint 8. Some operations of bitset, such as setting up Set, clearing Clear, switching Toggle, checking Has, can be implemented by bit operation, and they are very efficient.
Bit set has a natural advantage over set operation, which can be implemented directly through bit operators. For example, intersection, union, sum and difference sets, the examples are as follows:
- Intersection: A & B
- Union: a | b
- Difference set: A & (~b)
The underlying languages, libraries, and frameworks often use this approach to set flags.
In the above example, only a small amount of data is processed. uint8 takes up 8 bit s of space and can only represent 8 digits. Can you use this idea for big data scenarios?
We can combine bitset with slices in Go to redefine the Bits type as follows:
type Bitset struct { data []int64 }
But this also raises some questions. Setting bit, how do we know where it is? Think carefully, this location information contains two parts, that is, where the number of bits is stored in the slice index and which bits are in the number, named index and position respectively. How do I get it?
Index can be obtained by dividing. For example, we want to know which index of 65 bits in the slice can be obtained by 65/64. If we want to be efficient, we can also use bit operation. That is, displacement substitution division, such as 65 > 6, 6 means displacement offset, i.e. 2^n = 64 n.
postion is the remainder of division, which can be obtained by modular operations, such as 65% 64 = 1. Similarly, for efficiency, there are corresponding bit operations, such as 65 & 0b00111111111111, or 65 & 63.
A simple example is as follows:
package main import ( "fmt" ) const ( shift = 6 mask = 0x3f // That is 0b00111111111 ) type Bitset struct { data []int64 } func NewBitSet(n int) *Bitset { // Getting Location Information index := n >> shift set := &Bitset{ data: make([]int64, index+1), } // Setting bitset according to n set.data[index] |= 1 << uint(n&mask) return set } func (set *Bitset) Contains(n int) bool { // Getting Location Information index := n >> shift return set.data[index]&(1<<uint(n&mask)) != 0 } func main() { set := NewBitSet(65) fmt.Println("set contains 65", set.Contains(65)) fmt.Println("set contains 64", set.Contains(64)) }
Output results
set contains 65 true set contains 64 false
The function of the above example is very simple, just for demonstration. There are only two functions: creating bitset and contains. Other functions such as adding, deleting, intersection, Union and difference between different bitsets have not been realized. Interested friends can continue to try.
In fact, bitset package has been implemented, github address bit . You can read its source code, and the idea of implementation is similar to that described above.
Here is a use case.
package main import ( "fmt" "github.com/yourbasic/bit" ) func main() { s := bit.New(2, 3, 4, 65, 128) fmt.Println("s contains 65", s.Contains(65)) fmt.Println("s contains 15", s.Contains(15)) s.Add(15) fmt.Println("s contains 15", s.Contains(15)) fmt.Println("next 20 is ", s.Next(20)) fmt.Println("prev 20 is ", s.Prev(20)) s2 := bit.New(10, 22, 30) s3 := s.Or(s2) fmt.Println("next 20 is ", s3.Next(20)) s3.Visit(func(n int) bool { fmt.Println(n) return false // Returning true indicates termination of traversal }) }
Implementation results:
s contains 65 true s contains 15 false s contains 15 true next 20 is 65 prev 20 is 15 next 20 is 22 2 3 4 10 15 22 30 65 128
The meaning of the code is well understood, that is, some operations of adding, deleting, modifying, checking and collecting. It should be noted that the difference between bitset and the previous set is that the members of bitset can only be int integers, not set flexible. Usually, the use of scenarios is also relatively small, mainly in high efficiency and storage space requirements scenarios.
summary
This paper introduces the implementation principles of two sets in Go, and on this basis, introduces the simple use of two packages corresponding to them. I think, through this article, the use of set in Go can be basically done.
In addition to these two packages, add two more. zoumo/goset and github.com/willf/bitset.