ElasticSearch Linux single point and cluster configuration

Posted by ShaolinWood on Tue, 08 Feb 2022 04:33:54 +0100

Part 4: ElasticSearch environment

4.1 related concepts

4.1.1 single machine & cluster

When a single ElasticSearch server provides services, it often has the maximum load capacity. If it exceeds this threshold, the performance of the server will be greatly reduced or even unavailable. Therefore, in the production environment, it usually runs in the specified server cluster.
In addition to load capacity, single point servers also have other problems
- Limited storage capacity of a single machine
- Single server is prone to single point of failure and cannot achieve high availability
- The concurrent processing capability of single service is limited
When configuring a server cluster, there is no limit on the number of nodes in the cluster. If there are more than or equal to 2 nodes, it can be regarded as a cluster. Generally, considering high performance and high availability, the number of nodes in the cluster is more than 3

4.1.2 Cluster

A cluster is organized by one or more server nodes to jointly hold the whole data and provide indexing and search functions together. An elasticsearch cluster has a unique name ID, which is "elasticsearch" by default. This name is very important because a node can only join a cluster by specifying the name of the cluster.

4.1.3 Node

The cluster contains many servers, and a node is one of them. As a part of the cluster, it stores data and participates in the indexing and search functions of the cluster.
A node is also identified by a name. By default, this name is the name of a random Marvel comic character, which will be given to the node at startup. This name is very important for management, because in this management process, you will determine which servers in the network correspond to which nodes in the ElasticSearch cluster.
A node can join a specified cluster by configuring the cluster name. By default, if you don't have any elasticsearch nodes running in your network, any node will be scheduled to join a cluster called "elasticsearch", which means that if you start several nodes in your network and assume that they can find each other, They will automatically form and join a cluster called "elastic search".
In a cluster, you can have as many nodes as you want.

4.2 Linux stand alone

Using VMware 15 Pro virtual machine
- Link: https://pan.baidu.com/s/1FYuUxFIYo7ARUUYWMyv1AA Extraction code: gqru
Using CentOS7 mirroring
- Link: https://pan.baidu.com/s/16mKfHgby5jlPk9GMxnCnhw Extraction code: egaf

4.2.1 software download

ES address: https://www.elastic.co/cn/downloads/past-releases/elasticsearch-7-8-0

Software installation

Decompression software
- Upload the downloaded package to Linux and unzip it
```
tar -zxvf elasticsearch-7.8.0-linux-x86_64.tar.gz
```
- Rename
```
mv elasticsearch-7.8.0 es
```

Create user

Because of security problems, ElasticSearch does not allow the root user in Linux to run directly, so you need to create a new user and create a new user in the root user

useradd es # New es user (name can be customized)
passwd es # Set password for es

# If the setting is wrong, you can delete it before proceeding
userdel -r es # delete
chown -R es:es /opt/module/software/es # Set es user's operation permission for this folder

Modify profile

./config/elasticsearch.yml (can be added directly at the end)

cluster.name: test-elasticsearch-cluster # Configure cluster name (with default value)
node.name: node01 # Configure current node name
network.host: 0.0.0.0 # Configure IP address
http.port: 9200	# Configure port number
cluster.initial_master_nodes: ["node01"] # Specifies the collection of node names that exist in the cluster

Modify / etc / security / limits conf

# Set a limit on the number of files each process can open
es		 soft    nofile          65536
es       hard	 nofile          65536

Modify / etc / security / limits d/20-nproc. conf

# Set a limit on the number of files each process can open
es	 soft    nofile          65536
es   hard	 nofile          65536
# Operating system level limit on the number of processes created by each user
*	 hard	 nproc		 4096
# *Represents all Linux user names

Modify / etc / sysctl conf

# The number of Vmas (virtual several inch areas) that a process can own. The default value is 65536
vm.max_map_count=655360

Reload (execute directly from the command line)
```
sysctl -p
```

4.2.3 startup software

Start with ES user
During startup, the file will be generated dynamically. If the users of the file do not match, an error will occur and the user and user group need to be modified again
- ES does not allow root user to start
Use the su command to switch to the previously created ES user and restart
Found another problem: AccesDeniedException exception
- This is because es will dynamically generate some files during startup, but the generated files do not have permission to operate for ES users, so we need to re execute the authorization command
Execute the authorization command (you need to switch to the root user to execute it)
```
chown -R es:es /opt/module/software/es
```
Start again (switch to es user)
- Start successful
- It is found that es will occupy the current window after it is started. We can choose to start in the background
  - bin/elasticsearch -d

4.2.4 software testing

Use the browser to access: http: / / your Linux IP address: 9200/
- It is found that the browser has not responded because the firewall of Linux has blocked it

Simple way

You can directly close the firewall or open the corresponding port. Choose close here

# Temporarily turn off the firewall and automatically resume the next boot
systemctl stop firewalld
# Permanent shutdown (restart without recovery)
systemctl disable firewalld.service
# Permanently open
systemctl enable firewalld.service

When closing the firewall, you do not use the root user. You need to enter the password several times

Browser test again

4.3 Linux Cluster

Preparation environment: three Linux, and then write xsync script (the function is to distribute files on different Linux). You can also copy three copies without writing

4.3.1 software download

ES address: https://www.elastic.co/cn/downloa ，ds/past-releases/elasticsearch-7-8-0

4.3.2 software installation

4.3.2.1 decompression software

Unzip, rename, and distribute to other machines (02 and 03)

# decompression 
tar -zxvf elasticsearch-7.8.0-linux-x86_64.tar.gz
# Rename
mv elasticsearch-7.8.0 es-cluster
# Distribute files (just enter the confirmation information yes and password)
xsync es-cluster

4.3.2.2 create user

Due to security problems, ES does not allow the root user to run directly. Create a new user on the other two nodes, which is renamed es here

useradd es # New es user
passwd es # Set password
userdel -r es # If the setting is wrong, it can be deleted
chown -R es:es /opt/module/software/es-cluster # Set es user's access rights to files and modify them dynamically according to their own path

4.3.2.3 modify configuration file

After modifying a machine's file, distribute it, and then make some modifications on other nodes

Modify es cluster / config / elasticsearch YML, and then distribute the document

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
# Configure cluster name
cluster.name: test-cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
# Configure node information. Each node name cannot be duplicate
node.name: node01
# Are you qualified as the master node and data node
node.master: true
node.data: true
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
# ip address, each node is different
network.host: 192.168.111.10
#
# Set a custom port for HTTP:
#
#http.port: 9200
# Set port number
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# es7. New configuration after X, node discovery
discovery.seed_hosts: ["192.168.111.10:9300", "192.168.111.11:9300","192.168.111.12:9300"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
# es7. The configuration is added after X. This configuration is required to elect a master when initializing a new cluster
cluster.initial_master_nodes: ["node01"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
gateway.recover_after_nodes: 2
network.tcp.keep_alive: true
network.tcp.no_delay: true
transport.tcp.compress: true
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true


# The head plug-in needs to open these two settings
http.cors.allow-origin: "*"
http.cors.enabled: true
http.max_content_length: 200mb

# The number of data tasks started simultaneously in the cluster is 2 by default
cluster.routing.allocation.cluster_concurrent_rebalance: 16
# The number of concurrent recovery threads when adding or deleting nodes and load balancing. The default is 4
cluster.routing.allocation.node_concurrent_recoveries: 16
# When initializing data recovery, the number of concurrent recovery threads is 4 by default
cluster.routing.allocation.node_initial_primaries_recoveries: 16

Then execute xsync to distribute to other machines, and then modify the node name and ip address

Modify / etc / security / limits Conf, and then distribute the file

es               soft    nofile          65536
es               hard    nofile          65536

Modify / etc / security / limits d/20-nproc. Conf, and then distribute the file

# Limit on the number of files that can be opened per process
es       soft    nofile          65536
es       hard    nofile          65536
# Operating system level limit on the number of processes created by each user
*        hard    nproc           4096
# *Represents all user names for Linux

Modify / etc / sysctl Conf, and then distribute the file

# The number of Vmas (virtual memory areas) a process can own. The default value is 65536
vm.max_map_count=655360

To reload on each machine, you need to switch the root user
```
sysctl -p
```

4.3.3 startup software

Start ES on three machines respectively
Problems encountered
- elasticsearch. There is a problem with YML. When configuring the K:V value, there should be a space after the colon
Start successful