Elasticsearch Chinese character completion and spelling correction

Posted by KindredHyperion on Thu, 20 Jan 2022 02:17:31 +0100

1 effect achieved using ES

Chinese character completion

Spelling correction

2 product search and automatic completion


Term Advisor: term advisor. Word segmentation is performed for the input text, and word item suggestions are provided for each word segmentation
Phrase suggester: phrase suggester. Based on terms, it will consider the relationship between multiple terms
The completion advisor is mainly aimed at the application scenario of "Auto Completion"
Context Advisor: context Advisor

GET product_completion_index/_search
{
"from": 0,
"size": 100,
"suggest": {
 "czbk-suggest": {
  "prefix": "millet",
  "completion": {
   "field": "searchkey",
   "size": 20,
   "skip_duplicates": true
  }
 }
}
}

2.1 Chinese character completion OpenAPI

2.1.1 define automatic completion interface

GET product_completion_index/_search
{
"from": 0,
"size": 100,
"suggest": {
 "czbk-suggest": {
  "prefix": "millet",
  "completion": {
   "field": "searchkey",
   "size": 20,
   "skip_duplicates": true
  }
 }
}
}
package com.oldlu.service;
import com.oldlu.commons.pojo.CommonEntity;
import org.elasticsearch.action.DocWriteResponse;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.suggest.completion.CompletionSuggestion;
import java.util.List;
import java.util.Map;
/**
* @Class: ElasticsearchDocumentService
* @Package com.oldlu.service
* @Description: Document operation interface
* @Company: http://www.oldlu.com/
*/
public interface ElasticsearchDocumentService {
  //Automatic completion (completion suggestion)
  public List<String> cSuggest(CommonEntity commonEntity) throws Exception;
}

2.1.2 definition and implementation of automatic completion

/*
  * @Description: Automatic completion associates possible words or phrases according to the user's input
  * @Method: suggester
  * @Param: [commonEntity]
  * @Update:
  * @since: 1.0.0
  * @Return: org.elasticsearch.action.search.SearchResponse
  *
  */
  public List<String> cSuggest(CommonEntity commonEntity) throws Exception {
    //Define return
    List<String> suggestList = new ArrayList<>();
    //Build query request
    SearchRequest searchRequest = new
SearchRequest(commonEntity.getIndexName());
    //Define score sorting through query builder
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC));
    //Construct search suggestion statements and search criteria fields
    CompletionSuggestionBuilder completionSuggestionBuilder =new
CompletionSuggestionBuilder(commonEntity.getSuggestFileld());
    //Search keywords
    completionSuggestionBuilder.prefix(commonEntity.getSuggestValue());
    //Remove duplication
    completionSuggestionBuilder.skipDuplicates(true);
    //Matching quantity
    completionSuggestionBuilder.size(commonEntity.getSuggestCount());
    searchSourceBuilder.suggest(new SuggestBuilder().addSuggestion("czbk-
suggest", completionSuggestionBuilder));
    //Czbk suggest is the returned field. All returns will be in czbk suggest, which can be written to death. sort is arranged according to the score
 order
    searchRequest.source(searchSourceBuilder);
    //Define lookup response
    SearchResponse suggestResponse = client.search(searchRequest,
RequestOptions.DEFAULT);
    //Define completion suggestion object
    CompletionSuggestion completionSuggestion =
suggestResponse.getSuggest().getSuggestion("czbk-suggest");
    List<CompletionSuggestion.Entry.Option> optionsList =
completionSuggestion.getEntries().get(0).getOptions();
    //Fetch results from optionsList
    if (!CollectionUtils.isEmpty(optionsList)) {
      optionsList.forEach(item ->
suggestList.add(item.getText().toString()));
   }
    return suggestList;
 }

2.1.3 define automatic completion controller

/*
  * @Description Automatic completion
  * @Method: suggester
  * @Param: [commonEntity]
  * @Update:
  * @since: 1.0.0
  * @Return: com.oldlu.commons.result.ResponseData
  *
  */
  @GetMapping(value = "/csuggest")
  public ResponseData cSuggest(@RequestBody CommonEntity commonEntity) {
    // Construct return data
    ResponseData rData = new ResponseData();
    if (StringUtils.isEmpty(commonEntity.getIndexName()) ||
StringUtils.isEmpty(commonEntity.getSuggestFileld()) ||
StringUtils.isEmpty(commonEntity.getSuggestValue())) {
      rData.setResultEnum(ResultEnum.PARAM_ISNULL);
      return rData;
   }
    //Batch query return results
    List<String> result = null;
    try {
      //Call batch add operation method through high-level API
      result = elasticsearchDocumentService.cSuggest(commonEntity);
      //Automatic boxing by type inference (intersection of multiple parameters)
      rData.setResultEnum(result, ResultEnum.SUCCESS, result.size());
      //Logging
      logger.info(TipsEnum.CSUGGEST_GET_DOC_SUCCESS.getMessage());
   } catch (Exception e) {
      //Logging
      logger.error(TipsEnum.CSUGGEST_GET_DOC_FAIL.getMessage(), e);
      //Build error return information
      rData.setResultEnum(ResultEnum.ERROR);
   }
    return rData;
 }

2.1.4 automatic completion call verification

http://localhost:8888/v1/docs/csuggest

parameter

{
 "indexName": "product_completion_index",
 "suggestFileld": "searchkey",
 "suggestValue": "millet",
 "suggestCount": 13
}

indexName index name
suggestFileld: auto complete lookup column
suggestValue: automatically complete the entered keywords
suggestCount: number returned by automatic completion (13 for JD)

return

{
 "code": "200",
 "desc": "Operation succeeded!",
 "data": [
   "Millet 10",
   "Millet 10 Pro",
   "Xiaomi 8",
   "Xiaomi 9",
   "Xiaomi power bank",
   "Mi phones",
   "Millet camera",
   "Mi TV",
   "Millet rice cooker",
   "Xiaomi notebook",
   "Millet Earrings",
   "Xiaomi router"
 ],
 "count": 12
}

tips: automatic completion and automatic weight removal

2.2 Pinyin complement OpenAPI

Use pinyin to access [Xiaomi]

http://localhost:8888/v1/docs/csuggest

Complete interview
{
 "indexName": "product_completion_index",
 "suggestFileld": "searchkey",
 "suggestValue": "xiaomi",
  "suggestCount": 13
}
Complete interview(separate)
{
 "indexName": "product_completion_index",
 "suggestFileld": "searchkey",
 "suggestValue": "xiao mi",
  "suggestCount": 13
}
Initial access
{
 "indexName": "product_completion_index",
 "suggestFileld": "searchkey",
 "suggestValue": "xm",
  "suggestCount": 13
}

2.2.1 download the collage

wget https://github.com/medcl/elasticsearch-analysis-
pinyin/releases/download/v7.4.0/elasticsearch-analysis-pinyin-7.4.0.zip
perhaps
https://github.com/medcl/elasticsearch-analysis-pinyin/releases/tag/v7.4.0

When we create an index, we can customize the word breaker and match the custom word breaker by specifying a mapping

{
 "indexName": "product_completion_index",
 "map": {
   "settings": {
     "number_of_shards": 1,
     "number_of_replicas": 2,
     "analysis": {
       "analyzer": {
"ik_pinyin_analyzer": {
           "type": "custom",
           "tokenizer": "ik_smart",
           "filter": "pinyin_filter"
         }
       },
       "filter": {
         "pinyin_filter": {
           "type": "pinyin",
           "keep_first_letter": true,
           "keep_separate_first_letter": false,
           "keep_full_pinyin": true,
           "keep_original": true,
           "limit_first_letter_length": 16,
           "lowercase": true,
           "remove_duplicated_term": true
         }
       }
     }
   },
   "mapping": {
     "properties": {
       "name": {
         "type": "text"
       },
       "searchkey": {
         "type": "completion",
         "analyzer": "ik_pinyin_analyzer"
       }
     }
   }
 }
}

Call the [add document development API] interface to add data

Start Pinyin completion

3 what is language processing (spelling correction)

Scene description

For example, the wrong input of "[adidas official flagship store]" can be corrected to [adidas official flagship store]

3.1 language processing OpenAPI

GET product_completion_index/_search
{
"suggest": {
 "czbk-suggestion": {
  "text": "adidaas Official flagship store",
  "phrase": {
   "field": "name",
   "size": 13
  }
 }
}
}

return

3.1.1 define spelling correction interface

//Spelling correction
 public String pSuggest(CommonEntity commonEntity) throws Exception;

3.1.2 definition and implementation of spelling correction

/*
  * @Description: Spelling correction
  * @Method: psuggest
  * @Param: [commonEntity]
  * @Update:
  * @since: 1.0.0
  * @Return: java.util.List<java.lang.String>
  *
  */
  @Override
  public String pSuggest(CommonEntity commonEntity) throws Exception {
    //Define return
    String pSuggestString = new String();
    //Define query request
SearchRequest searchRequest = new
SearchRequest(commonEntity.getIndexName());
    //Define query criteria builder
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    //Define sorter
    searchSourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC));
    //Construct phrase recommender object (parameter is matching column)
    PhraseSuggestionBuilder pSuggestionBuilder = new
PhraseSuggestionBuilder(commonEntity.getSuggestFileld());
    //Search keywords (corrected values)
    pSuggestionBuilder.text(commonEntity.getSuggestValue());
    //Matching quantity
    pSuggestionBuilder.size(1);
    searchSourceBuilder.suggest(new SuggestBuilder().addSuggestion("czbk-
suggest", pSuggestionBuilder));
    searchRequest.source(searchSourceBuilder);
    //Define lookup response
    SearchResponse suggestResponse = client.search(searchRequest,
RequestOptions.DEFAULT);
    //Define phrase suggestion object
    PhraseSuggestion phraseSuggestion =
suggestResponse.getSuggest().getSuggestion("czbk-suggest");
    //Get return data
    List<PhraseSuggestion.Entry.Option> optionsList =
phraseSuggestion.getEntries().get(0).getOptions();
    //Fetch results from optionsList
    if (!CollectionUtils.isEmpty(optionsList)
&&optionsList.get(0).getText()!=null) {
      pSuggestString = optionsList.get(0).getText().string().replaceAll("
","");
   }
    return pSuggestString;
 }

3.1.3 define spelling error correction controller

/*
  * @Description: Spelling correction
  * @Method: suggester2
  * @Param: [commonEntity]
  * @Update:
  * @since: 1.0.0
  * @Return: com.oldlu.commons.result.ResponseData
  *
  */
 @GetMapping(value = "/psuggest")
 public ResponseData pSuggest(@RequestBody CommonEntity commonEntity) {
   // Construct return data
   ResponseData rData = new ResponseData();
   if (StringUtils.isEmpty(commonEntity.getIndexName()) ||
StringUtils.isEmpty(commonEntity.getSuggestFileld()) ||
StringUtils.isEmpty(commonEntity.getSuggestValue())) {
     rData.setResultEnum(ResultEnum.PARAM_ISNULL);
     return rData;
}
   //Batch query return results
   String result = null;
   try {
     //Call batch add operation method through high-level API
     result = elasticsearchDocumentService.pSuggest(commonEntity);
     //Automatic boxing by type inference (intersection of multiple parameters)
     rData.setResultEnum(result, ResultEnum.SUCCESS, null);
     //Logging
     logger.info(TipsEnum.PSUGGEST_GET_DOC_SUCCESS.getMessage());
   } catch (Exception e) {
     //Logging
     logger.error(TipsEnum.PSUGGEST_GET_DOC_FAIL.getMessage(), e);
     //Build error return information
     rData.setResultEnum(ResultEnum.ERROR);
   }
   return rData;
 }

3.1.4 language processing call verification

http://localhost:8888/v1/docs/psuggest

parameter

{
 "indexName": "product_completion_index",
 "suggestFileld": "name",
 "suggestValue": "adidaas Official flagship store"
}

indexName index name
suggestFileld: auto complete lookup column
suggestValue: automatically complete the entered keywords
return

{
 "code": "200",
 "desc": "Operation succeeded!",
 "data": "adidas Official flagship store"
}

4 Summary

  1. A search thesaurus / corpus is needed, not together with the business index database, so as to facilitate the maintenance and upgrading of the corpus
  2. According to word segmentation and other search conditions, query several records (13 from JD, 10 from Taobao and 4 from Baidu) in the corpus
    return
  3. In order to improve accuracy, prefix search is usually used

Topics: Big Data ElasticSearch search engine