Elastic search 7.1 in depth learning

Posted by oskom on Mon, 04 May 2020 14:05:17 +0200

1. elatic engineer certification examination

If you have the strength, you can take the exam. It will be helpful for the appreciation and salary increase

2. es architecture



jvm configuration

Install plug-ins

Use elasticsearch plugin in elasticsearch-7.1.0/bin to install the plug-in. es will download and install the plug-in through the network.

. / elasticsearch plugin list to view the installed plug-ins.

Install international word segmentation plug-in:. / elasticsearch plugin install analysis ICU

Start es

If the startup fails: can not run elasticsearch as root,

es cannot be started with root. For solutions, refer to https://www.cnblogs.com/gcgc/p/10297563.html

Use background start mode:. / elasticsearch -d

If you find using to open with a browser,

But if you can't access the IP address of a computer, such as, don't be nervous,

Because this IP is ES by default, is used. If you want to use other IP, you need to configure it.

You can use the
When network.host is set to and configured to, it will be found that or machine IP can be accessed. The test can be configured to

Or configure the IP address you want first

For example:


However, an error is reported:
the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

This is because if network.host is changed, ES needs to configure the cluster IP seed_hosts to find the cluster machines of the same intranet,
The default is node-1, that is, if a stand-alone ES, the primary node is node-1
So just configure:
cluster.initial_master_nodes: ["node-1"]
that will do
Refer to http://www.freedom.com/article/620549947/


network.host can also set some special values, such as "local", "site", "global", IP4 and IP6. For more details, please refer to "Special values for network.host".

Once you have customized network.host, Elasticsearch assumes that you are moving from development mode to production mode, and upgrades many system startup checks from warnings to exceptions. For more information, see "Development mode vs production mode".

If network.host is configured, es will fail to start and an error will be reported:

[2020-03-17T20:29:25,386][ERROR][o.e.b.Bootstrap          ] [DESKTOP-UINEBG0] node validation exception
[1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

Because this is the cluster mode, you must configure the parameters of the cluster


Multi instance es

The above is to install multiple es on one computer


Configuration cluster parameter description

# Cluster name, default is elasticsearch
cluster.name: wali
# Machine name of the machine 
node.name: slave1

# Network segment of es machine 
# Open http interface 
http.port: 8200

# Discovery cluster, that is, the IP address of the domain name where most of the machines in the cluster are configured (if three es cluster servers are in one machine, then one IP address is enough)
discovery.seed_hosts: [""] 
# Don't configure like this. It's not good to bring the port number. It says "host". It must be the IP address or domain name
#discovery.seed_hosts: ["",""]

# The name of the initialized node when the es cluster is built
# If other nodes start, the following nodes must be started to be available.
# And the whole es cluster can be used. Only two nodes can be used to elect the primary node.
# For example, slave1 is configured, which does not mean that this slave1 will become a primary node or a child node, depending on the election
cluster.initial_master_nodes: ["slave1"]

# This function is similar to discovery.seed_hosts, but es7 does not recommend it. It may be removed later 
# Do not configure discovery.zen.ping.unicast.hosts if discovery.seed'hosts and cluster.initial'master'nodes are configured
#discovery.zen.ping.unicast.hosts: [""]

cluster.name: wali
node.name: slave2

http.port: 8300

# discovery.seed_hosts: [""]
discovery.seed_hosts: ["",""]
cluster.initial_master_nodes: ["slave1"]

#discovery.zen.ping.unicast.hosts: [""]

The cluster configuration of ES requires at least 2 machines. This is not zk. zk needs at least three




By adding
I18n.locale: "zh CN", it supports Chinese display.

es monitoring tool cerebro

Use this tool to get a more direct view of es operation


Execute bin/logstash -f logstash.conf

In linux, the corresponding logstash.tar.gz is used instead of the zip package

Basic concept of es


Abstraction and analogy

That is to say, the type s of indexes after es7 are all "doc"

Index related API

#View index related information
GET kibana_sample_data_ecommerce

#Total number of documents to view index
GET kibana_sample_data_ecommerce/_count

#Check the top 10 documents for document format
POST kibana_sample_data_ecommerce/_search

#_cat indices API
#View indexes
GET /_cat/indices/kibana*?v&s=index

#View index in green
GET /_cat/indices?v&health=green

#Sort by number of documents
GET /_cat/indices?v&s=docs.count:desc

#View specific fields
GET /_cat/indices/kibana*?pri&v&h=health,index,pri,rep,docs.count,mt

#How much memory is used per index?
GET /_cat/indices?v&h=i,tm&s=tm:desc

es node

Election master

node type

If the number of ES cluster servers in production is too small, it will follow the development, regardless of the details


Operation of documents





update can also add fields

Bulk API batch operation

Batch operation can improve efficiency, otherwise the network overhead is very large

Common error return

Inverted index

Forward index type catalog contents of a Book

The inverted index is the content of the corresponding directory

Analyst participle

Using the analyzer API


Participle demo

#Simple Analyzer – non alphabetic segmentation (symbols filtered), lowercase processing
#Stop Analyzer - lowercase processing, the, a, is
#Whitespace Analyzer – split by spaces, not lowercase
#Keyword Analyzer – uses input as output without segmentation
#Pattern analyzer – regular expression, default \ W + (non character delimited)
#Language – provides word breakers in more than 30 common languages
#2 running Quick brown-foxes leap over lazy dogs in the summer evening

#See the effects of different analyzer s
GET _analyze
  "analyzer": "standard",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."

GET _analyze
  "analyzer": "simple",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."

GET _analyze
  "analyzer": "stop",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."

GET _analyze
  "analyzer": "whitespace",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."

GET _analyze
  "analyzer": "keyword",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."

GET _analyze
  "analyzer": "pattern",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."

GET _analyze
  "analyzer": "english",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."

POST _analyze
  "analyzer": "icu_analyzer",
  "text": "What he said is true. ""

POST _analyze
  "analyzer": "standard",
  "text": "What he said is true. ""

POST _analyze
  "analyzer": "icu_analyzer",
  "text": "This apple is not very delicious"

Chinese participle

Search API


Search relevance

Measuring relevance

quest Body and Query DSL


_source filtering

Script fields

Phrase Search

With slop, you can query the corresponding content, but there are other values in the middle, such as "song a last Chris"


title with and as result must have Last Christmas

Query string


Automatic type recognition

Null ignore, can be replaced by an empty string, or configure null_value for the field

Update mapping field type

Default dynatic is true

Mapping settings


copy to


mapping multi field features and configuring custom Analyzer

exact values, full text

Custom participle

Reference resources https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-stop-tokenfilter.html

index template

Aggregate analysis

Aggregation classification


es summary

Drill down

term query


Full text query

Structured search

Date range

Relevance score of search

Word frequency TF


Multi string and multi field query, filter

bool query


boost affects the order and content of returned results

not quite not

POST /news/_bulk
{ "index": { "_id": 1 }}
{ "content":"Apple Mac" }
{ "index": { "_id": 2 }}
{ "content":"Apple iPad" }
{ "index": { "_id": 3 }}
{ "content":"Apple employee like Apple Pie and Apple Juice" }

POST news/_search
  "query": {
    "bool": {
      "must": {

POST news/_search
  "query": {
    "bool": {
      "must": {
      "must_not": {

POST news/_search
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "content": "apple"
      "negative": {
        "match": {
          "content": "pie"
      "negative_boost": 0.5

In combination with the above, most not or negative can filter the information without pie or rank it last. Negative can control the accuracy of the query

Single string multi field query

Single string multi field query: Multi Match


Here title^10 stands for weight

Multilingual and Chinese word segmentation

Test relevance

Query with search Template and index Alias

index alias

When the index is renamed or rebuilt, the alias can be used to make the front end not need to be modified and continue to access.

Comprehensive sorting: Function Score Query optimization score

That is, the seed value is the same, and the random content is consistent

term&Phrase Suggester

Autocomplete and context based prompt


Context prompt:

Configure cross cluster search

Cross cluster search is the example above

Sorting and DOC values & fiedlddata

Paging and traversal

If the default from exceeds 10000 or the size exceeds 10000, an error will be reported.

Handle concurrent read and write operations

Concurrent update. If the version is wrong, an error will be reported

It can also be updated through version

In depth aggregation analysis

Bucket & metric aggregation analysis and nested aggregation

aggregation usage


This configuration can be opened when aggregate queries are frequent, index data is updated or added frequently, and performance requirements are high

pipeline aggregation analysis: aggregation and re aggregation


Aggregation scope and ordering

It can be sorted according to the attributes corresponding to stats, such as the minimum value

Principle and accuracy of aggregate analysis

ES modeling

Data modeling best practices


Segment merging, merge optimization






Improve cluster read performance

Cluster write performance

http 429 Too Many Requests (this happens when too much content is written, or when too much is written)

The string text will generate a corresponding keyword by default. This is not good. If it is not used

Cache and use breaker to limit memory usage



circuit breaker

Cluster backup

Index lifecycle management

The performance of shrink api is better than reindex-

The rollover API can use alias to point to a new index

That is, when the amount of data in an index is too large, you can use this method to write the data into a new index.


management tool

elasticsearch curator


Cluster data backup

You should not simply back up the files in the data directory of es, which is not recommended by the official.



The above documents can be referred to: https://github.com/geektime-geekbang/geektime-ELK

Topics: Big Data network ElasticSearch jvm Linux