ElasticSearch query DSL full-text search (match_all, match, match_phrase, match_phrase_prefix, multi_match)

Posted by optikalefx on Thu, 13 Jan 2022 04:21:34 +0100

Full text retrieval

match_all

match_all is to retrieve all data without any conditions

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  }
}

match(Match query)

Match is used for basic fuzzy matching. The text will be segmented in es, and the query conditions will be segmented during match query, and then the matching data will be found through inverted index. The following parameters are supported in match:

Query: query criteria
operator: matching condition (AND, OR (Default))
minimum_should_match: the minimum number of matches. It is used to specify that only when the document contains at least several keywords can it be matched
fuzziness: maximum editing distance, see for details Term level query The contents of the fuzzy query section in.
prefix_length: the first number of characters that cannot be blurred during fuzzy query through the maximum editing distance. The default value is 0. Here are some examples
fuzzy_ Transfers: boolean value. The default value is true. It indicates whether to include the position exchange of two adjacent characters when extending the fuzzy option
fuzzy_rewrite: you can rewrite the query method, which has not been practiced yet. For more instructions, please refer to: rewrite parameter
analyzer: you can specify a word breaker. If not specified, the default is used
max_expansions: Reference Term level query The contents of the fuzzy query section in.
zero_terms_query: in actual documents, there may be many such words, such as, in Chinese, or, and, is, and do in English. Then such words may not be helpful to our search. We call these words stop words. There is a stop word analyzer in ES: Stop token filter , if the keywords of our query request include these words and the analyzer is used, it will help us remove these stop words. If all the keywords in our query request are removed, it will not match any documents. At this time, do you want to return a blank to the user? ES provides two strategies, which are represented by this field:
- none (default): no documents are returned
- All: all documents are returned, which is equivalent to matching_ all
Lenient: lenient means kind and tolerant. This indicates whether to ignore some input errors, such as entering a string for a numeric field to match. If it is set to true, it will be ignored, and the default value is false
auto_generate_synonyms_phrase_query: in some scenarios, there may be two ways to write one meaning. For example, ElasticSearch may be written as es. Although the writing method is different, it describes one thing. In fact, if we limit the query condition to ElasticSearch, we also hope to search es related content. We can call this word synonym query. Therefore, ES provides us with this parameter to indicate whether to enable synonym query. It is true by default, that is, enabled. But one problem is how does es know which words are synonyms? There is a concept in Lucene called Synonym Graph Token Filter, which also exists in ES. We can configure this to realize synonym query. For the configuration method, refer to: Synonym graph token filter

# In match query, you can specify query criteria
# The following statement will participle Eddie Underwood, and then just customer_full_name will hit one of Eddie and Underwood
GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match": {
      "customer_full_name": "Eddie Underwood"
    }
  }
}

# In the above query, Eddieh or Underwood can appear. Sometimes if the query target content is huge, including one or two words we searched, it will also be searched
# But in fact, the correlation may not be large, so we can also use minimum_should_match specifies the minimum number of words to match
GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match": {
      "customer_full_name": {
        "query": "Eddie Underwood",
        "minimum_should_match": 2
      }
    }
  },
  "_source":["customer_full_name","currency"]
}

# Of course, match also supports specifying matching conditions through the operator
# The following statement will mean only customer_ full_ Only Eddie and Underwood can be hit in the name field at the same time
GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match": {
      "customer_full_name": {
        "query": "Eddie Underwood",
        "operator": "and"
      }
    }
  }
}

# In many cases, there is little chance of making mistakes at the beginning, and it is only when you walk that you deviate. Therefore, if this parameter is set reasonably, you can reduce the number of blurring and improve the performance
GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match": {
      "customer_full_name": {
        "query": "Eadie Underwood",
        "fuzziness": 2,  # Limit the maximum editing distance to 2
        "prefix_length": 2, # The default value here is 0, so we can enter Eadie to match Eddie, but if it is specified as 2, it will not match
        "operator": "and"   # Only Eadie and Underwood in the title can be matched. Eddie Underwood's document is in the document
      }
    }
  }, 
  "_source": "customer_full_name"
}

match_phrase(Match phrase query)

match_phrase (phrase) will segment the input, but it is required that all participles should be included in the result and the order should be consistent. This condition is actually a little harsh. Sometimes I may input a wrong word or a phrase. I only remember two words. What if I can't remember the third word? ES also provides the slop parameter to help us solve this problem:

slop (default 0): (slop: overflow) to specify that additional words can also be hit.

# match_ The phrase method is to query the conditions in the query as a whole word. Only Eddie Underwood can be hit
GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_phrase": {
      "customer_full_name": {
        "query": "Eddie Underwood", 
        "slop": 1  # Eddie test Underwood can also be hit
      }
    }
  }
}

match_phrase_prefix(Match phrase prefix query)

match_phrase_prefix can be regarded as match_ An extension of phrase. It will segment the query conditions of query, and then regard the last word as a prefix, match all words prefixed with this word in the index, and then return

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_phrase_prefix": {
      "customer_full_name":{
        "query": "Eddie"  # It can be matched to Eddie Underwood and Eddie Utest
      }
    }
  }
}

multi match(Multi-match query)

multi_match is enhanced on the basis of match. It supports querying multiple fields. Wildcards are also supported in the description of the field.

# In customer_full_name,customer_ first_ A field in name can be matched by including query criteria
GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "multi_match": {
      "query": "Eddie",
      "fields": ["customer_full_name","customer_first_name"]  # It can be written as ["* _name"]
    }
  },
  "_source":["customer_full_name","currency","customer_first_name"]
}

In multi match query, sometimes there may be such a scenario. For example, when you search a blog with mutli and match keywords in CSDN, these two keywords may be in the title of the article and in the content, but it is obvious that the articles with these two keywords in the title at the same time may be of more concern to us, that is, they are more relevant, So how to make this article stand out and rank at the top of the search results?

In fact, ES will do a score for each document searched through_ The field score indicates that a sorting will be carried out according to this field. The higher the score, the higher the result will be displayed. In the above scenario, we can specify that the title field occupies a greater weight in scoring. The writing method is similar to the following:

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "multi_match": {
      "query": "Eddie",
      "fields": ["customer_full_name^3","customer_first_name"]  # Here, use ^ 3 to specify his score proportion. The larger the number, the larger the proportion, and the smaller the proportion of other fields
    }
  },
  "_source":["customer_full_name","currency","customer_first_name"]
}

In multi_ In the query method of match, there is also a particularly important field type. In fact, every query depends on this field. Although we did not specify it above, it is because it has a default value of best_fields. Next, let's take a look, except best_ What types are there besides fields and what is the use of each type.

best_fields: DIS as discussed earlier_ Max as an example, refer to Combined query of DSL Introduction to. For example, if we want to find brown fox in the two fields of title and body in a document, if the two words we want to find appear at the same time in order, it must be more meaningful than appearing in different fields respectively. We will appear the two words in one field in order at the same time, which is called best_fields. In that case, multi_match will be packaged as dis_max this way, to query. Then we know dis_max also supports a parameter tie_breaker, so it is also supported here
```
GET baike/_search
{
  "query": {
    "multi_match": {
      "query": "Brown fox",
      "fields": ["title","body"],
      "type": "best_fields",
      "tie_breaker": 0.1
    }
  }
}

# The above request will eventually be executed in this way
POST /baike/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {"match": {"title": "Brown fox"}},
        {"match": {"body": "Brown fox"}}
      ],
      "tie_breaker": 0.1
    }
  }
}
```

most_fields: best_fields takes the highest score in the sub query as the final score, while most_fields takes the sum of sub query scores as the final score. We can also understand that the more keywords appear, the higher the score. Here, we can refer to the previous article on dis_max's discussion.

GET baike/_search
{
  "query": {
    "multi_match": {
      "query": "Brown fox",
      "fields": ["title","body"],
      "type": "most_fields",
      "tie_breaker": 0.1
    }
  }
}

# The above request will eventually be executed in this way
POST baike/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {"title":"Brown fox"}},
        {"match": {"body":"Brown fox"}}
      ]
    }
  }
}

phrase and phrase_prefix: both types and best_fields' scoring strategy is consistent. We discuss best_ When fields, it is explained in the example code that he will eventually be translated into dis_max plus match to execute, then these two types use match_phrase and match_phrase_ Replace prefix.

GET baike/_search
{
  "query": {
    "multi_match": {
      "query": "Brow",
      "fields": ["title","body"],
      "type": "phrase_prefix",
      "tie_breaker": 0.1
    }
  }
}

# The above request will eventually be executed in this way
GET baike/_search
{
  "query": {
    "dis_max": {
      "tie_breaker": 0.1,
      "queries": [
        {"match_phrase_prefix": {"title": "Brow"}},
        {"match_phrase_prefix": {"body": "Brow"}}
      ]
    }
  }
}

bool_prefix: this type is similar to most_fields' scoring strategy is consistent, but he will use match_bool_prefix replaces match.

GET baike/_search
{
  "query": {
    "multi_match": {
      "query": "Brow",
      "fields": ["title","body"],
      "type": "bool_prefix"
    }
  }
}

# The above request will eventually be executed in this way
GET baike/_search
{
  "query": {
    "bool": {
      "should": [
        {"match_bool_prefix": {"title": "Brow"}},
        {"match_bool_prefix": {"body": "Brow"}}
      ]
    }
  }
}

cross_fields: Reference Cross query across fields_ fields Introduction in.

Topics: ElasticSearch

Programmer Think