In the first two lessons, we used a lot of regular expressions to match the city list, city and user information. In fact, in addition to regular expressions to match, we can also use goquery and xpath third-party library to match useful information. And I used more elegant regular expression matching. Let's talk about regular expressions.

For example, when we match the city list, we will take the url that matches all cities, as follows:

You can see that the picture is followed by lowercase letters and numbers, so you can extract it in the following ways:

<a href="(http://www.zhenai.com/zhenghun/[0-9a-z]+)"[^>]*>([^<]+)</a>

[0-9a-z] + means matching lowercase letters or numbers at least once, [^ >] * means matching non > characters any times, and then [^ <] + means matching non < characters at least once. We need to get the url and city name of the city, so we grouped them.

You can get the url and city by

const (
   cityListReg = `<a href="(http://www.zhenai.com/zhenghun/[0-9a-z]+)"[^>]*>([^<]+)</a>`

 compile := regexp.MustCompile(cityListReg)

 submatch := compile.FindAllSubmatch(contents, -1)

 for _, m := range submatch {
   fmt.Println("url:" , string(m[1]), "city:", string(m[2]))

The match contains g g and at least one lowercase letter in the middle of gg:

//Match contains g g and at least one lowercase letter in the middle of gg
 match, _ := regexp.MatchString("g([a-z]+)g", "11golang11")

We directly use the regular expression of string matching, but for other regular matching tasks, we need to use an optimized regular object:

compile, err := regexp.Compile("smallsoup@gmail.com")

 if err != nil {
   //... regular syntax error, need to handle error


compile, err :=regexp.Compile("smallsoup@gmail.com")

The function returns a regular expression matcher and an error. When the parameter regular expression does not conform to the regular syntax, an error is returned. For example, regexp.Compile("[smallsoup@gmail.com") will report missing closing]

Generally, regular expressions need to handle errors only when they are entered by users, but they can't make mistakes when they write them. So you can use compile:= regexp.MustCompile("smallsoup@gmail.com"). If the syntax is wrong, panic will occur.

text1 := `my email is aa@qq.com
  aa email is aa@gmail.com
  bb email is bb@qq.com
  cc email is cc@qq.com.cn
 //If you want to extract A, B and C in A@B.C, you need to use the regular expression extraction function.
 comp := regexp.MustCompile(`([a-zA-Z0-9]+)@([a-zA-Z0-9.]+)\.([a-zA-Z0-9]+)`)

 //Using self matching to get matching content in parentheses of regular expression
 submatchs := comp.FindAllStringSubmatch(text1, -1)

 //Submatches is actually a two-dimensional array.

 //To remove every match, submatch is actually a slice.
 for _, submatch := range submatchs {

The result output is as follows:

[[aa@qq.com aa qq com] [aa@gmail.com aa gmail com] [bb@qq.com bb qq com] [cc@qq.com.cn cc qq.com cn]]
[aa@qq.com aa qq com]
[aa@gmail.com aa gmail com]
[bb@qq.com bb qq com]
[cc@qq.com.cn cc qq.com cn]
r := regexp.MustCompile("p([a-z]+)ch")
 fmt.Println(r) //----->p([a-z]+)ch
 //The regexp package can also be used to replace some strings with other values.
 fmt.Println(r.ReplaceAllString("a peach", "<smallsoup>")) //----->a <smallsoup>
 //The Func variable allows matching content to be passed into a given function.
 in := []byte("a smallsoup")
 out := r.ReplaceAllFunc(in, bytes.ToUpper)
 fmt.Println(string(out)) //----->a PEACH
 /*#######################Common expressions###########################*/
 // Find Chinese characters
 testText := "Hello How are you, I like golang!"
 reg := regexp.MustCompile(`[\p{Han}]+`)
 fmt.Println(reg.FindAllString(testText, -1)) // ----->[Hello]
 reg = regexp.MustCompile(`[\P{Han}]+`)
 fmt.Println(reg.FindAllString(testText, -1))        // ----->["Hello " ", I li golang!"]
 fmt.Printf("%q\n", reg.FindAllString(testText, -1)) // ----->["Hello " ", I lm golang!"]
 reg = regexp.MustCompile(`\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*`)
 //User name and password:
 reg = regexp.MustCompile(`[a-zA-Z]|\w{6,18}`)

The operation results are as follows:

a <smallsoup>
a smallsoup
[How are you]
[Hello  , I like golang!]
["Hello " ", I like golang!"]

Process finished with exit code 0

