Elastic search 7.1 in depth learning

Posted by oskom on Mon, 04 May 2020 14:05:17 +0200

1. elatic engineer certification examination

If you have the strength, you can take the exam. It will be helpful for the appreciation and salary increase

2. es architecture

Ecosphere

install

jvm configuration

Install plug-ins

Use elasticsearch plugin in elasticsearch-7.1.0/bin to install the plug-in. es will download and install the plug-in through the network.

. / elasticsearch plugin list to view the installed plug-ins.

Install international word segmentation plug-in:. / elasticsearch plugin install analysis ICU


Start es

./elasticsearch
If the startup fails: can not run elasticsearch as root,

es cannot be started with root. For solutions, refer to https://www.cnblogs.com/gcgc/p/10297563.html

Use background start mode:. / elasticsearch -d

If you find using 127.0.0.1:9200 to open with a browser,

But if you can't access the IP address of a computer, such as 192.168.0.101:9200, don't be nervous,

Because this IP is ES by default, 127.0.0.1 is used. If you want to use other IP, you need to configure it.

You can use the
When network.host is set to 0.0.0.0 and configured to 0.0.0.0, it will be found that 127.0.0.1 or machine IP can be accessed. The test can be configured to 0.0.0.0

Or configure the IP address you want first

For example:

network.host: 192.168.0.103

However, an error is reported:
the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

This is because if network.host is changed, ES needs to configure the cluster IP seed_hosts to find the cluster machines of the same intranet,
The default is node-1, that is, if a stand-alone ES, the primary node is node-1
So just configure:
cluster.initial_master_nodes: ["node-1"]
that will do
Refer to http://www.freedom.com/article/620549947/

network.host

network.host can also set some special values, such as "local", "site", "global", IP4 and IP6. For more details, please refer to "Special values for network.host".

Once you have customized network.host, Elasticsearch assumes that you are moving from development mode to production mode, and upgrades many system startup checks from warnings to exceptions. For more information, see "Development mode vs production mode".

If network.host is configured, es will fail to start and an error will be reported:

[2020-03-17T20:29:25,386][ERROR][o.e.b.Bootstrap          ] [DESKTOP-UINEBG0] node validation exception
[1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

Because this is the cluster mode, you must configure the parameters of the cluster

 

Multi instance es

The above is to install multiple es on one computer

 

Configuration cluster parameter description




# Cluster name, default is elasticsearch
cluster.name: wali
# Machine name of the machine 
node.name: slave1

# Network segment of es machine 
network.host: 127.0.0.1
# Open http interface 
http.port: 8200



# Discovery cluster, that is, the IP address of the domain name where most of the machines in the cluster are configured (if three es cluster servers are in one machine, then one IP address is enough)
discovery.seed_hosts: ["127.0.0.1"] 
# Don't configure like this. It's not good to bring the port number. It says "host". It must be the IP address or domain name
#discovery.seed_hosts: ["127.0.0.1:8300","127.0.0.1:8200"]



# The name of the initialized node when the es cluster is built
# If other nodes start, the following nodes must be started to be available.
# And the whole es cluster can be used. Only two nodes can be used to elect the primary node.
# For example, slave1 is configured, which does not mean that this slave1 will become a primary node or a child node, depending on the election
cluster.initial_master_nodes: ["slave1"]



# This function is similar to discovery.seed_hosts, but es7 does not recommend it. It may be removed later 
# Do not configure discovery.zen.ping.unicast.hosts if discovery.seed'hosts and cluster.initial'master'nodes are configured
#discovery.zen.ping.unicast.hosts: ["127.0.0.1"]

cluster.name: wali
node.name: slave2

network.host: 127.0.0.1
http.port: 8300

# discovery.seed_hosts: ["127.0.0.1"]
discovery.seed_hosts: ["127.0.0.1:8300","127.0.0.1:8200"]
cluster.initial_master_nodes: ["slave1"]

#discovery.zen.ping.unicast.hosts: ["127.0.0.1"]

The cluster configuration of ES requires at least 2 machines. This is not zk. zk needs at least three

 

kibana

Sinicization

By adding
I18n.locale: "zh CN", it supports Chinese display.


es monitoring tool cerebro

Use this tool to get a more direct view of es operation

Logstash

Execute bin/logstash -f logstash.conf

In linux, the corresponding logstash.tar.gz is used instead of the zip package

Basic concept of es

 

Abstraction and analogy

That is to say, the type s of indexes after es7 are all "doc"

Index related API

#View index related information
GET kibana_sample_data_ecommerce

#Total number of documents to view index
GET kibana_sample_data_ecommerce/_count

#Check the top 10 documents for document format
POST kibana_sample_data_ecommerce/_search
{
}

#_cat indices API
#View indexes
GET /_cat/indices/kibana*?v&s=index

#View index in green
GET /_cat/indices?v&health=green

#Sort by number of documents
GET /_cat/indices?v&s=docs.count:desc

#View specific fields
GET /_cat/indices/kibana*?pri&v&h=health,index,pri,rep,docs.count,mt

#How much memory is used per index?
GET /_cat/indices?v&h=i,tm&s=tm:desc

es node

Election master

node type

If the number of ES cluster servers in production is too small, it will follow the development, regardless of the details

Fragmentation

Operation of documents

create

get

index

update

update can also add fields

Bulk API batch operation

Batch operation can improve efficiency, otherwise the network overhead is very large

Common error return

Inverted index

Forward index type catalog contents of a Book

The inverted index is the content of the corresponding directory

Analyst participle

Using the analyzer API

standard

Participle demo

#Simple Analyzer – non alphabetic segmentation (symbols filtered), lowercase processing
#Stop Analyzer - lowercase processing, the, a, is
#Whitespace Analyzer – split by spaces, not lowercase
#Keyword Analyzer – uses input as output without segmentation
#Pattern analyzer – regular expression, default \ W + (non character delimited)
#Language – provides word breakers in more than 30 common languages
#2 running Quick brown-foxes leap over lazy dogs in the summer evening

#See the effects of different analyzer s
#standard
GET _analyze
{
  "analyzer": "standard",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

#simpe
GET _analyze
{
  "analyzer": "simple",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}


GET _analyze
{
  "analyzer": "stop",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}


#stop
GET _analyze
{
  "analyzer": "whitespace",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

#keyword
GET _analyze
{
  "analyzer": "keyword",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

GET _analyze
{
  "analyzer": "pattern",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}


#english
GET _analyze
{
  "analyzer": "english",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}


POST _analyze
{
  "analyzer": "icu_analyzer",
  "text": "What he said is true. ""
}


POST _analyze
{
  "analyzer": "standard",
  "text": "What he said is true. ""
}


POST _analyze
{
  "analyzer": "icu_analyzer",
  "text": "This apple is not very delicious"
}

Chinese participle

Search API

response

Search relevance

Measuring relevance

quest Body and Query DSL

sort

_source filtering

Script fields

Phrase Search

With slop, you can query the corresponding content, but there are other values in the middle, such as "song a last Chris"

 

title with and as result must have Last Christmas

Query string

Mapping

Automatic type recognition

Null ignore, can be replaced by an empty string, or configure null_value for the field

Update mapping field type

Default dynatic is true

Mapping settings

null_value

copy to

array

mapping multi field features and configuring custom Analyzer

exact values, full text

Custom participle

Reference resources https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-stop-tokenfilter.html

index template

Aggregate analysis

Aggregation classification

nesting

es summary

Drill down

term query

 

Full text query

Structured search

Date range

Relevance score of search

Word frequency TF

Boosting 

Multi string and multi field query, filter

bool query

 

boost affects the order and content of returned results

not quite not

DELETE news
POST /news/_bulk
{ "index": { "_id": 1 }}
{ "content":"Apple Mac" }
{ "index": { "_id": 2 }}
{ "content":"Apple iPad" }
{ "index": { "_id": 3 }}
{ "content":"Apple employee like Apple Pie and Apple Juice" }


POST news/_search
{
  "query": {
    "bool": {
      "must": {
        "match":{"content":"apple"}
      }
    }
  }
}

POST news/_search
{
  "query": {
    "bool": {
      "must": {
        "match":{"content":"apple"}
      },
      "must_not": {
        "match":{"content":"pie"}
      }
    }
  }
}

POST news/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "content": "apple"
        }
      },
      "negative": {
        "match": {
          "content": "pie"
        }
      },
      "negative_boost": 0.5
    }
  }
}

In combination with the above, most not or negative can filter the information without pie or rank it last. Negative can control the accuracy of the query

Single string multi field query

Single string multi field query: Multi Match

 

Here title^10 stands for weight

Multilingual and Chinese word segmentation

Test relevance

Query with search Template and index Alias

index alias

When the index is renamed or rebuilt, the alias can be used to make the front end not need to be modified and continue to access.

Comprehensive sorting: Function Score Query optimization score

That is, the seed value is the same, and the random content is consistent

term&Phrase Suggester

Autocomplete and context based prompt

 

Context prompt:

Configure cross cluster search

Cross cluster search is the example above

Sorting and DOC values & fiedlddata

Paging and traversal

If the default from exceeds 10000 or the size exceeds 10000, an error will be reported.

Handle concurrent read and write operations

Concurrent update. If the version is wrong, an error will be reported

It can also be updated through version

In depth aggregation analysis

Bucket & metric aggregation analysis and nested aggregation

aggregation usage

 

This configuration can be opened when aggregate queries are frequent, index data is updated or added frequently, and performance requirements are high

pipeline aggregation analysis: aggregation and re aggregation

 

Aggregation scope and ordering

It can be sorted according to the attributes corresponding to stats, such as the minimum value

Principle and accuracy of aggregate analysis

ES modeling

Data modeling best practices

 

Segment merging, merge optimization

 

 

 

 

 

Improve cluster read performance

Cluster write performance

http 429 Too Many Requests (this happens when too much content is written, or when too much is written)

The string text will generate a corresponding keyword by default. This is not good. If it is not used

Cache and use breaker to limit memory usage

 

 

circuit breaker

Cluster backup

Index lifecycle management

The performance of shrink api is better than reindex-

The rollover API can use alias to point to a new index

That is, when the amount of data in an index is too large, you can use this method to write the data into a new index.

 

management tool

elasticsearch curator

 

Cluster data backup

You should not simply back up the files in the data directory of es, which is not recommended by the official.

 

 

The above documents can be referred to: https://github.com/geektime-geekbang/geektime-ELK

Topics: Big Data network ElasticSearch jvm Linux