1 effect achieved using ES
Chinese character completion
Spelling correction
2 product search and automatic completion
Term Advisor: term advisor. Word segmentation is performed for the input text, and word item suggestions are provided for each word segmentation
Phrase suggester: phrase suggester. Based on terms, it will consider the relationship between multiple terms
The completion advisor is mainly aimed at the application scenario of "Auto Completion"
Context Advisor: context Advisor
GET product_completion_index/_search { "from": 0, "size": 100, "suggest": { "czbk-suggest": { "prefix": "millet", "completion": { "field": "searchkey", "size": 20, "skip_duplicates": true } } } }
2.1 Chinese character completion OpenAPI
2.1.1 define automatic completion interface
GET product_completion_index/_search { "from": 0, "size": 100, "suggest": { "czbk-suggest": { "prefix": "millet", "completion": { "field": "searchkey", "size": 20, "skip_duplicates": true } } } }
package com.oldlu.service; import com.oldlu.commons.pojo.CommonEntity; import org.elasticsearch.action.DocWriteResponse; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.rest.RestStatus; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.suggest.completion.CompletionSuggestion; import java.util.List; import java.util.Map; /** * @Class: ElasticsearchDocumentService * @Package com.oldlu.service * @Description: Document operation interface * @Company: http://www.oldlu.com/ */ public interface ElasticsearchDocumentService { //Automatic completion (completion suggestion) public List<String> cSuggest(CommonEntity commonEntity) throws Exception; }
2.1.2 definition and implementation of automatic completion
/* * @Description: Automatic completion associates possible words or phrases according to the user's input * @Method: suggester * @Param: [commonEntity] * @Update: * @since: 1.0.0 * @Return: org.elasticsearch.action.search.SearchResponse * */ public List<String> cSuggest(CommonEntity commonEntity) throws Exception { //Define return List<String> suggestList = new ArrayList<>(); //Build query request SearchRequest searchRequest = new SearchRequest(commonEntity.getIndexName()); //Define score sorting through query builder SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC)); //Construct search suggestion statements and search criteria fields CompletionSuggestionBuilder completionSuggestionBuilder =new CompletionSuggestionBuilder(commonEntity.getSuggestFileld()); //Search keywords completionSuggestionBuilder.prefix(commonEntity.getSuggestValue()); //Remove duplication completionSuggestionBuilder.skipDuplicates(true); //Matching quantity completionSuggestionBuilder.size(commonEntity.getSuggestCount()); searchSourceBuilder.suggest(new SuggestBuilder().addSuggestion("czbk- suggest", completionSuggestionBuilder)); //Czbk suggest is the returned field. All returns will be in czbk suggest, which can be written to death. sort is arranged according to the score order searchRequest.source(searchSourceBuilder); //Define lookup response SearchResponse suggestResponse = client.search(searchRequest, RequestOptions.DEFAULT); //Define completion suggestion object CompletionSuggestion completionSuggestion = suggestResponse.getSuggest().getSuggestion("czbk-suggest"); List<CompletionSuggestion.Entry.Option> optionsList = completionSuggestion.getEntries().get(0).getOptions(); //Fetch results from optionsList if (!CollectionUtils.isEmpty(optionsList)) { optionsList.forEach(item -> suggestList.add(item.getText().toString())); } return suggestList; }
2.1.3 define automatic completion controller
/* * @Description Automatic completion * @Method: suggester * @Param: [commonEntity] * @Update: * @since: 1.0.0 * @Return: com.oldlu.commons.result.ResponseData * */ @GetMapping(value = "/csuggest") public ResponseData cSuggest(@RequestBody CommonEntity commonEntity) { // Construct return data ResponseData rData = new ResponseData(); if (StringUtils.isEmpty(commonEntity.getIndexName()) || StringUtils.isEmpty(commonEntity.getSuggestFileld()) || StringUtils.isEmpty(commonEntity.getSuggestValue())) { rData.setResultEnum(ResultEnum.PARAM_ISNULL); return rData; } //Batch query return results List<String> result = null; try { //Call batch add operation method through high-level API result = elasticsearchDocumentService.cSuggest(commonEntity); //Automatic boxing by type inference (intersection of multiple parameters) rData.setResultEnum(result, ResultEnum.SUCCESS, result.size()); //Logging logger.info(TipsEnum.CSUGGEST_GET_DOC_SUCCESS.getMessage()); } catch (Exception e) { //Logging logger.error(TipsEnum.CSUGGEST_GET_DOC_FAIL.getMessage(), e); //Build error return information rData.setResultEnum(ResultEnum.ERROR); } return rData; }
2.1.4 automatic completion call verification
http://localhost:8888/v1/docs/csuggest
parameter
{ "indexName": "product_completion_index", "suggestFileld": "searchkey", "suggestValue": "millet", "suggestCount": 13 }
indexName index name
suggestFileld: auto complete lookup column
suggestValue: automatically complete the entered keywords
suggestCount: number returned by automatic completion (13 for JD)
return
{ "code": "200", "desc": "Operation succeeded!", "data": [ "Millet 10", "Millet 10 Pro", "Xiaomi 8", "Xiaomi 9", "Xiaomi power bank", "Mi phones", "Millet camera", "Mi TV", "Millet rice cooker", "Xiaomi notebook", "Millet Earrings", "Xiaomi router" ], "count": 12 }
tips: automatic completion and automatic weight removal
2.2 Pinyin complement OpenAPI
Use pinyin to access [Xiaomi]
http://localhost:8888/v1/docs/csuggest
Complete interview { "indexName": "product_completion_index", "suggestFileld": "searchkey", "suggestValue": "xiaomi", "suggestCount": 13 } Complete interview(separate) { "indexName": "product_completion_index", "suggestFileld": "searchkey", "suggestValue": "xiao mi", "suggestCount": 13 } Initial access { "indexName": "product_completion_index", "suggestFileld": "searchkey", "suggestValue": "xm", "suggestCount": 13 }
2.2.1 download the collage
wget https://github.com/medcl/elasticsearch-analysis-
pinyin/releases/download/v7.4.0/elasticsearch-analysis-pinyin-7.4.0.zip
perhaps
https://github.com/medcl/elasticsearch-analysis-pinyin/releases/tag/v7.4.0
When we create an index, we can customize the word breaker and match the custom word breaker by specifying a mapping
{ "indexName": "product_completion_index", "map": { "settings": { "number_of_shards": 1, "number_of_replicas": 2, "analysis": { "analyzer": { "ik_pinyin_analyzer": { "type": "custom", "tokenizer": "ik_smart", "filter": "pinyin_filter" } }, "filter": { "pinyin_filter": { "type": "pinyin", "keep_first_letter": true, "keep_separate_first_letter": false, "keep_full_pinyin": true, "keep_original": true, "limit_first_letter_length": 16, "lowercase": true, "remove_duplicated_term": true } } } }, "mapping": { "properties": { "name": { "type": "text" }, "searchkey": { "type": "completion", "analyzer": "ik_pinyin_analyzer" } } } } }
Call the [add document development API] interface to add data
Start Pinyin completion
3 what is language processing (spelling correction)
Scene description
For example, the wrong input of "[adidas official flagship store]" can be corrected to [adidas official flagship store]
3.1 language processing OpenAPI
GET product_completion_index/_search { "suggest": { "czbk-suggestion": { "text": "adidaas Official flagship store", "phrase": { "field": "name", "size": 13 } } } }
return
3.1.1 define spelling correction interface
//Spelling correction public String pSuggest(CommonEntity commonEntity) throws Exception;
3.1.2 definition and implementation of spelling correction
/* * @Description: Spelling correction * @Method: psuggest * @Param: [commonEntity] * @Update: * @since: 1.0.0 * @Return: java.util.List<java.lang.String> * */ @Override public String pSuggest(CommonEntity commonEntity) throws Exception { //Define return String pSuggestString = new String(); //Define query request SearchRequest searchRequest = new SearchRequest(commonEntity.getIndexName()); //Define query criteria builder SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //Define sorter searchSourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC)); //Construct phrase recommender object (parameter is matching column) PhraseSuggestionBuilder pSuggestionBuilder = new PhraseSuggestionBuilder(commonEntity.getSuggestFileld()); //Search keywords (corrected values) pSuggestionBuilder.text(commonEntity.getSuggestValue()); //Matching quantity pSuggestionBuilder.size(1); searchSourceBuilder.suggest(new SuggestBuilder().addSuggestion("czbk- suggest", pSuggestionBuilder)); searchRequest.source(searchSourceBuilder); //Define lookup response SearchResponse suggestResponse = client.search(searchRequest, RequestOptions.DEFAULT); //Define phrase suggestion object PhraseSuggestion phraseSuggestion = suggestResponse.getSuggest().getSuggestion("czbk-suggest"); //Get return data List<PhraseSuggestion.Entry.Option> optionsList = phraseSuggestion.getEntries().get(0).getOptions(); //Fetch results from optionsList if (!CollectionUtils.isEmpty(optionsList) &&optionsList.get(0).getText()!=null) { pSuggestString = optionsList.get(0).getText().string().replaceAll(" ",""); } return pSuggestString; }
3.1.3 define spelling error correction controller
/* * @Description: Spelling correction * @Method: suggester2 * @Param: [commonEntity] * @Update: * @since: 1.0.0 * @Return: com.oldlu.commons.result.ResponseData * */ @GetMapping(value = "/psuggest") public ResponseData pSuggest(@RequestBody CommonEntity commonEntity) { // Construct return data ResponseData rData = new ResponseData(); if (StringUtils.isEmpty(commonEntity.getIndexName()) || StringUtils.isEmpty(commonEntity.getSuggestFileld()) || StringUtils.isEmpty(commonEntity.getSuggestValue())) { rData.setResultEnum(ResultEnum.PARAM_ISNULL); return rData; } //Batch query return results String result = null; try { //Call batch add operation method through high-level API result = elasticsearchDocumentService.pSuggest(commonEntity); //Automatic boxing by type inference (intersection of multiple parameters) rData.setResultEnum(result, ResultEnum.SUCCESS, null); //Logging logger.info(TipsEnum.PSUGGEST_GET_DOC_SUCCESS.getMessage()); } catch (Exception e) { //Logging logger.error(TipsEnum.PSUGGEST_GET_DOC_FAIL.getMessage(), e); //Build error return information rData.setResultEnum(ResultEnum.ERROR); } return rData; }
3.1.4 language processing call verification
http://localhost:8888/v1/docs/psuggest
parameter
{ "indexName": "product_completion_index", "suggestFileld": "name", "suggestValue": "adidaas Official flagship store" }
indexName index name
suggestFileld: auto complete lookup column
suggestValue: automatically complete the entered keywords
return
{ "code": "200", "desc": "Operation succeeded!", "data": "adidas Official flagship store" }
4 Summary
- A search thesaurus / corpus is needed, not together with the business index database, so as to facilitate the maintenance and upgrading of the corpus
- According to word segmentation and other search conditions, query several records (13 from JD, 10 from Taobao and 4 from Baidu) in the corpus
return - In order to improve accuracy, prefix search is usually used