For installation es, refer to: Installing elasticsearch
Install ik plug-in (slow online)
# Enter the inside of the container docker exec -it elasticsearch /bin/bash # Download and install online ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip #sign out exit #Restart container docker restart elasticsearch
Install ik plug-ins offline (recommended)
View data volume directory
To install the plug-in, you need to know the location of the plugins directory of elasticsearch, which is mounted with a data volume. Therefore, you need to view the data volume directory of elasticsearch through the following command:
docker inspect es
Display results:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-b1Osfj5m-1631424878178)(/images/es/02.png)]
Note the plugins directory is mounted in: / root / docker / ES / es plugins.
Unzip the installation package of word splitter
Decompress ik word splitter and rename it to ik (download address: Click download directly)
Upload to the plug-in data volume of the es container
That is / root / docker / ES / es plugins
Restart container
# Restart container docker restart es
# View es log docker logs -f es
Test:
GET /_analyze { "analyzer": "ik_max_word", "text": "I am a famous farmer in a new era" }
result:
{ "tokens" : [ { "token" : "I", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "yes", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "One", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 2 }, { "token" : "one", "start_offset" : 2, "end_offset" : 3, "type" : "TYPE_CNUM", "position" : 3 }, { "token" : "individual", "start_offset" : 3, "end_offset" : 4, "type" : "COUNT", "position" : 4 }, { "token" : "New era", "start_offset" : 4, "end_offset" : 7, "type" : "CN_WORD", "position" : 5 }, { "token" : "times", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 6 }, { "token" : "agriculture", "start_offset" : 7, "end_offset" : 8, "type" : "CN_CHAR", "position" : 7 }, { "token" : "name", "start_offset" : 8, "end_offset" : 9, "type" : "CN_CHAR", "position" : 8 }, { "token" : "work", "start_offset" : 9, "end_offset" : 10, "type" : "CN_CHAR", "position" : 9 } ] }
Summary:
-
ik_smart: minimum segmentation
-
ik_max_word: thinnest segmentation
Extended word dictionary
With the development of the Internet, "word making movement" is becoming more and more frequent. There are many new words that do not exist in the original vocabulary list, such as "aoligai", "jujuezi", etc.
Therefore, vocabulary also needs to be constantly updated. IK word splitter provides the function of expanding vocabulary.
1) Open the IK word breaker config directory
2) In the IKAnalyzer.cfg.xml configuration file, add:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer Extended configuration</comment> <!--Users can configure their own extended dictionary here--> <entry key="ext_dict">ext.dic</entry> <!--Users can configure their own extended stop word dictionary here *** Add stop word dictionary--> <entry key="ext_stopwords">stopword.dic</entry> </properties>
3) Create a new ext.dic. You can copy a configuration file under the config directory for modification
Jujuezi awesome
4) Restart elasticsearch
docker restart es # View log docker logs -f elasticsearch
The ext.dic configuration file has been successfully loaded in the log
5) Test effect:
GET /_analyze { "analyzer": "ik_max_word", "text": "It's a unique son to eliminate the underworld and evil" }
Note that the encoding of the current file must be in UTF-8 format. It is strictly prohibited to edit it with Windows Notepad
Stop word dictionary
In Internet projects, the transmission speed between networks is very fast, so many languages are not allowed to be transmitted on the network, such as sensitive words such as religion and politics, so we should also ignore the current words when searching.
The IK word splitter also provides a powerful stop word function, allowing us to directly ignore the contents of the current stop vocabulary when indexing.
1) Add the contents of IKAnalyzer.cfg.xml configuration file:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer Extended configuration</comment> <!--Users can configure their own extended dictionary here--> <entry key="ext_dict">ext.dic</entry> <!--Users can configure their own extended stop word dictionary here *** Add stop word dictionary--> <entry key="ext_stopwords">stopword.dic</entry> </properties>
3) Add a stop word in stopword.dic
heroin narcotics
4) Restart elasticsearch
docker restart elasticsearch docker restart kibana # View log docker logs -f elasticsearch
The stopword.dic configuration file has been successfully loaded in the log
5) Test effect:
GET /_analyze { "analyzer": "ik_max_word", "text": "Prohibition of drugs" }
Note that the encoding of the current file must be in UTF-8 format. It is strictly prohibited to edit it with Windows Notepad