Article catalog
1. Background
There is a business requirement to map the user ID (numeric > = 10000000) into a unique and non repetitive invitation code with a length of 6 characters to invite new users to register. You can deduce the corresponding user ID without using the invitation code.
2. My ideas
data:image/s3,"s3://crabby-images/51b62/51b6215d002616e2df24b032c32d4f1b64909d54" alt=""
First, determine the character space to generate the invitation code, using a total of 62 characters of numbers and English uppercase and lowercase letters. If the length of the invitation code is 6, the space size is 62 ^ 6 = 56800235584, which is a very large space, enough for 100 million users.
// AlphanumericSet alphanumeric set var AlphanumericSet = []rune{ '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', }
Of course, to improve the user experience, you can remove the five confusing characters o, O, 0, I and 1.
Then, after the UID is hashed through MD5, take the first 8 bytes and the last 8 bytes of the hash value for XOR operation to obtain the random number seed.
// GetSeedByUID obtains its corresponding seed through the user ID func GetSeedByUID(uid string) int64 { sum := md5.Sum([]byte(uid)) pre := binary.BigEndian.Uint64(sum[:8]) suf := binary.BigEndian.Uint64(sum[8:]) return int64(pre^suf) }
Finally, use the obtained seed to create a random number generator. The random range is the alphanumeric set, and the random number is 6 times the length of the invitation code.
// GetInvCodeByUID gets the invitation code of the specified length according to the user ID func GetInvCodeByUID(uid string, l int) string { seed := GetSeedByUID(uid) r := rand.New(rand.NewSource(seed)) var code []rune for i := 0; i < l; i++ { idx := r.Intn(len(AlphanumericSet)) code = append(code, AlphanumericSet[idx]) } return string(code) } func main() { fmt.Println(GetInvCodeByUID("100000000", 6)) // i0jLVz fmt.Println(GetInvCodeByUID("100000001", 6)) // fhTeiE fmt.Println(GetInvCodeByUID("100000002", 6)) // K5R5OP }
3. Hidden giant pit
Everything seems beautiful, but is it really so?
If the random number sequence of different seeds is random, the probability of collision of the above invitation code is (1 / 62) ^ 6. This is an event with very low probability, which can be considered impossible, so it meets our requirements.
Write a unit test to verify it.
func TestGetInvCodeByUID(t *testing.T) { sUID, eUID := 10000000, 11000000 var seedConCnt int // Number of seed conflicts var codeConCnt int // Number of invitation code conflicts mSeed := make(map[int64]struct{}) mCode := make(map[string]struct{}) // Count the number of collisions of invitation codes generated by 100W user ID s t.Run("getConflictNumTestCase", func(t *testing.T) { for uid := sUID; uid < eUID; uid++ { seed := GetSeedByUID(strconv.Itoa(uid)) if _, ok := mSeed[seed]; ok { seedConCnt++ codeConCnt++ continue } mSeed[seed] = struct{}{} code := GetInvCodeByUID(strconv.Itoa(uid), 6) if _, ok := mCode[code]; ok { codeConCnt++ continue } mCode[code] = struct{}{} } if seedConCnt != 0 || codeConCnt != 0 { t.Errorf("seedConCnt=%v, codeConCnt=%v conRate=%v", seedConCnt, codeConCnt, float64(codeConCnt)/float64(eUID-sUID)) } }) }
go test run output:
--- FAIL: TestGetInvCodeByUID (10.53s) --- FAIL: TestGetInvCodeByUID/getConflictNumTestCase (10.53s) main_test.go:33: seedConCnt=0, codeConCnt=246 conRate=0.000246 FAIL exit status 1 FAIL test 11.294s
It can be seen that the test case failed. There are 246 invitation codes and conflicts among 100W user ID s. The conflict rate is one in ten thousandth, rather than the expected (1 / 62) ^ 6, which is completely unacceptable. Why does this happen? The seeds of random numbers are different!
This is because we have overlooked a problem: Birthday question . The calculation of the above conflict probability is problematic. Assuming that none of the first 100W repeats, the probability of the next repetition is ((1/62)^6 * 100W) ≈ 1/5.6W. The probability of the conflict rate has reached one in ten thousandth, which is much greater than the intuitive (1 / 62) ^ 6. As the number of invitation codes generated increases, the probability of collision will continue to increase.
4. Solutions
Back to the original requirements, I only need to uniquely map the UID to the invitation code of the corresponding length. In fact, I can directly take the first 6 bytes of MD5 value as the subscript without random value. From this point of view, what I did above is really icing on the cake.
// GetInvCodeByUID gets the invitation code of the specified length func GetInvCodeByUID(uid string, l int) string { // Because the md5 value is 16 bytes if l > 16 { return "" } sum := md5.Sum([]byte(uid)) var code []rune for i := 0; i < l; i++ { idx := sum[i] % byte(len(AlphanumericSet)) code = append(code, AlphanumericSet[idx]) } return string(code) }
Modify the first 6 bytes of a single test statistical md5 value, which is the same as the collision probability.
func TestGetInvCodeByUID(t *testing.T) { sUID, eUID := 10000000, 11000000 var md5ConCnt int // md5 first 6 byte collision times var codeConCnt int // Number of invitation code conflicts mSeed := make(map[uint64]struct{}) mCode := make(map[string]struct{}) // Count the number of collisions of invitation codes generated by 100W user ID s t.Run("getConflictNumTestCase", func(t *testing.T) { for uid := sUID; uid < eUID; uid++ { sum := md5.Sum([]byte(strconv.Itoa(uid))) md5Value := uint64(sum[5]) | uint64(sum[4])<<8 | uint64(sum[3])<<16 | uint64(sum[2])<<24 | uint64(sum[1])<<32 | uint64(sum[0])<<40 if _, ok := mSeed[md5Value]; ok { md5ConCnt++ codeConCnt++ continue } mSeed[md5Value] = struct{}{} code := GetInvCodeByUID(strconv.Itoa(uid), 6) if _, ok := mCode[code]; ok { codeConCnt++ continue } mCode[code] = struct{}{} } if md5ConCnt != 0 || codeConCnt != 0 { t.Errorf("md5ConCnt=%v codeConCnt=%v conRate=%v", md5ConCnt, codeConCnt, float64(codeConCnt)/float64(eUID-sUID)) } }) }
Single test execution results:
--- FAIL: TestGetInvCodeByUID (1.26s) --- FAIL: TestGetInvCodeByUID/getConflictNumTestCase (1.26s) main_test.go:35: md5ConCnt=0, codeConCnt=5 conRate=5e-06 FAIL exit status 1 FAIL test 1.424s
It can be seen that the conflict rate has dropped to one in a million. Because the target space of invitation codes is 62 ^ 6 = 56800235584, as the number of generated invitation codes increases, the collision probability will also increase. This collision rate of one millionth is normal.
The reason for collision in this way is that although each byte has different values, the modulo size of the character set may be the same, so collision may occur. In order to solve the problem of collision, we can use DB (such as Redis) to judge whether there is a collision. If there is a collision, we can hash and take the module to generate the corresponding invitation code, or use other bytes of the hash value to generate the corresponding invitation code.
5. Other solutions
Is there a generation method with collision rate of 0? After all, the user ID is unique, and it is natural to generate a unique invitation code.
Because our user ID is a numerical value, we can regard it as a 62 digit number. The value range of each digit is 0 ~ 61, which is similar to that of 10 digit number. Each digit of 62 digit is taken as the subscript of the character set. In this way, we can use the binary method (division, rounding and modulus).
// GetInvCodeByUIDUnique gets the invitation code of the specified length func GetInvCodeByUIDUnique(uid uint64, l int) string { var code []rune for i := 0; i < l; i++ { idx := uid % uint64(len(AlphanumericSet)) code = append(code, AlphanumericSet[idx]) uid = uid / uint64(len(AlphanumericSet)) // It is equivalent to shifting one bit to the right (base 62) } return string(code) } // Example fmt.Println(GetInvCodeByUIDUnique(100000000, 6)) // ezAL60 fmt.Println(GetInvCodeByUIDUnique(100000001, 6)) // fzAL60 fmt.Println(GetInvCodeByUIDUnique(100000002, 6)) // gzAL60
In theory, there will be no conflict. Let's test it.
func TestGetInvCodeByUIDUnique(t *testing.T) { sUID, eUID := 10000000, 20000000 var codeConCnt int // Number of invitation code conflicts mCode := make(map[string]struct{}) // Count the collision times of invitation codes generated by 1KW user ID s t.Run("getConflictNumTestCaseUnique", func(t *testing.T) { for uid := sUID; uid < eUID; uid++ { code := GetInvCodeByUIDUnique(uint64(uid), 6) if _, ok := mCode[code]; ok { codeConCnt++ continue } mCode[code] = struct{}{} } if codeConCnt != 0 { t.Errorf("codeConCnt=%v conRate=%v", codeConCnt, float64(codeConCnt)/float64(eUID-sUID)) } }) }
Execute the single test command go test -run "TestGetInvCodeByUIDUnique", and the results are as follows.
=== RUN TestGetInvCodeByUIDUnique === RUN TestGetInvCodeByUIDUnique/getConflictNumTestCaseUnique --- PASS: TestGetInvCodeByUIDUnique (7.20s) --- PASS: TestGetInvCodeByUIDUnique/getConflictNumTestCaseUnique (7.20s) PASS ok test 7.389s
If the business scenario does not allow the user ID to be pushed back through the invitation code, you can consider diffusing and confusing the user ID to increase the cost of pushing back the user ID. for details, see Several methods of generating unique invitation code from user ID.
reference
CSDN. Fully master Go math/rand CSDN. Several methods of generating unique invitation code from user ID Wikipedia. Birthday question