ElasticSearch Linux single point and cluster configuration

Posted by ShaolinWood on Tue, 08 Feb 2022 04:33:54 +0100

Part 4: ElasticSearch environment

4.1 related concepts

4.1.1 single machine & cluster

  • When a single ElasticSearch server provides services, it often has the maximum load capacity. If it exceeds this threshold, the performance of the server will be greatly reduced or even unavailable. Therefore, in the production environment, it usually runs in the specified server cluster.
  • In addition to load capacity, single point servers also have other problems
    • Limited storage capacity of a single machine
    • Single server is prone to single point of failure and cannot achieve high availability
    • The concurrent processing capability of single service is limited
  • When configuring a server cluster, there is no limit on the number of nodes in the cluster. If there are more than or equal to 2 nodes, it can be regarded as a cluster. Generally, considering high performance and high availability, the number of nodes in the cluster is more than 3

4.1.2 Cluster

  • A cluster is organized by one or more server nodes to jointly hold the whole data and provide indexing and search functions together. An elasticsearch cluster has a unique name ID, which is "elasticsearch" by default. This name is very important because a node can only join a cluster by specifying the name of the cluster.

4.1.3 Node

  • The cluster contains many servers, and a node is one of them. As a part of the cluster, it stores data and participates in the indexing and search functions of the cluster.
  • A node is also identified by a name. By default, this name is the name of a random Marvel comic character, which will be given to the node at startup. This name is very important for management, because in this management process, you will determine which servers in the network correspond to which nodes in the ElasticSearch cluster.
  • A node can join a specified cluster by configuring the cluster name. By default, if you don't have any elasticsearch nodes running in your network, any node will be scheduled to join a cluster called "elasticsearch", which means that if you start several nodes in your network and assume that they can find each other, They will automatically form and join a cluster called "elastic search".
  • In a cluster, you can have as many nodes as you want.

4.2 Linux stand alone

  • Using VMware 15 Pro virtual machine
    • Link: https://pan.baidu.com/s/1FYuUxFIYo7ARUUYWMyv1AA Extraction code: gqru
  • Using CentOS7 mirroring
    • Link: https://pan.baidu.com/s/16mKfHgby5jlPk9GMxnCnhw Extraction code: egaf

4.2.1 software download

  • ES address: https://www.elastic.co/cn/downloads/past-releases/elasticsearch-7-8-0

Software installation

  • Decompression software

    • Upload the downloaded package to Linux and unzip it

      tar -zxvf elasticsearch-7.8.0-linux-x86_64.tar.gz
      
    • Rename

      mv elasticsearch-7.8.0 es
      
  • Create user

    • Because of security problems, ElasticSearch does not allow the root user in Linux to run directly, so you need to create a new user and create a new user in the root user

      useradd es # New es user (name can be customized)
      passwd es # Set password for es
      
      # If the setting is wrong, you can delete it before proceeding
      userdel -r es # delete
      chown -R es:es /opt/module/software/es # Set es user's operation permission for this folder
      
  • Modify profile

    • ./config/elasticsearch.yml (can be added directly at the end)

      cluster.name: test-elasticsearch-cluster # Configure cluster name (with default value)
      node.name: node01 # Configure current node name
      network.host: 0.0.0.0 # Configure IP address
      http.port: 9200	# Configure port number
      cluster.initial_master_nodes: ["node01"] # Specifies the collection of node names that exist in the cluster
      
    • Modify / etc / security / limits conf

      # Set a limit on the number of files each process can open
      es		 soft    nofile          65536
      es       hard	 nofile          65536
      
    • Modify / etc / security / limits d/20-nproc. conf

      # Set a limit on the number of files each process can open
      es	 soft    nofile          65536
      es   hard	 nofile          65536
      # Operating system level limit on the number of processes created by each user
      *	 hard	 nproc		 4096
      # *Represents all Linux user names
      
    • Modify / etc / sysctl conf

      # The number of Vmas (virtual several inch areas) that a process can own. The default value is 65536
      vm.max_map_count=655360
      
    • Reload (execute directly from the command line)

      sysctl -p
      

4.2.3 startup software

  • Start with ES user

  • During startup, the file will be generated dynamically. If the users of the file do not match, an error will occur and the user and user group need to be modified again

    • ES does not allow root user to start
  • Use the su command to switch to the previously created ES user and restart

  • Found another problem: AccesDeniedException exception

    • This is because es will dynamically generate some files during startup, but the generated files do not have permission to operate for ES users, so we need to re execute the authorization command
  • Execute the authorization command (you need to switch to the root user to execute it)

    chown -R es:es /opt/module/software/es
    
  • Start again (switch to es user)

    • Start successful
    • It is found that es will occupy the current window after it is started. We can choose to start in the background
      • bin/elasticsearch -d

4.2.4 software testing

  • Use the browser to access: http: / / your Linux IP address: 9200/

    • It is found that the browser has not responded because the firewall of Linux has blocked it
  • Simple way

    • You can directly close the firewall or open the corresponding port. Choose close here

      # Temporarily turn off the firewall and automatically resume the next boot
      systemctl stop firewalld
      # Permanent shutdown (restart without recovery)
      systemctl disable firewalld.service
      # Permanently open
      systemctl enable firewalld.service
      
    • When closing the firewall, you do not use the root user. You need to enter the password several times

  • Browser test again

4.3 Linux Cluster

  • Preparation environment: three Linux, and then write xsync script (the function is to distribute files on different Linux). You can also copy three copies without writing

4.3.1 software download

  • ES address: https://www.elastic.co/cn/downloa ,ds/past-releases/elasticsearch-7-8-0

4.3.2 software installation

4.3.2.1 decompression software
  • Unzip, rename, and distribute to other machines (02 and 03)

    # decompression 
    tar -zxvf elasticsearch-7.8.0-linux-x86_64.tar.gz
    # Rename
    mv elasticsearch-7.8.0 es-cluster
    # Distribute files (just enter the confirmation information yes and password)
    xsync es-cluster
    
4.3.2.2 create user
  • Due to security problems, ES does not allow the root user to run directly. Create a new user on the other two nodes, which is renamed es here

    useradd es # New es user
    passwd es # Set password
    userdel -r es # If the setting is wrong, it can be deleted
    chown -R es:es /opt/module/software/es-cluster # Set es user's access rights to files and modify them dynamically according to their own path
    
4.3.2.3 modify configuration file
  • After modifying a machine's file, distribute it, and then make some modifications on other nodes

  • Modify es cluster / config / elasticsearch YML, and then distribute the document

    # ======================== Elasticsearch Configuration =========================
    #
    # NOTE: Elasticsearch comes with reasonable defaults for most settings.
    #       Before you set out to tweak and tune the configuration, make sure you
    #       understand what are you trying to accomplish and the consequences.
    #
    # The primary way of configuring a node is via this file. This template lists
    # the most important settings you may want to configure for a production cluster.
    #
    # Please consult the documentation for further information on configuration options:
    # https://www.elastic.co/guide/en/elasticsearch/reference/index.html
    #
    # ---------------------------------- Cluster -----------------------------------
    #
    # Use a descriptive name for your cluster:
    #
    #cluster.name: my-application
    # Configure cluster name
    cluster.name: test-cluster
    #
    # ------------------------------------ Node ------------------------------------
    #
    # Use a descriptive name for the node:
    #
    #node.name: node-1
    # Configure node information. Each node name cannot be duplicate
    node.name: node01
    # Are you qualified as the master node and data node
    node.master: true
    node.data: true
    #
    # Add custom attributes to the node:
    #
    #node.attr.rack: r1
    #
    # ----------------------------------- Paths ------------------------------------
    #
    # Path to directory where to store the data (separate multiple locations by comma):
    #
    #path.data: /path/to/data
    #
    # Path to log files:
    #
    #path.logs: /path/to/logs
    #
    # ----------------------------------- Memory -----------------------------------
    #
    # Lock the memory on startup:
    #
    #bootstrap.memory_lock: true
    #
    # Make sure that the heap size is set to about half the memory available
    # on the system and that the owner of the process is allowed to use this
    # limit.
    #
    # Elasticsearch performs poorly when the system is swapping the memory.
    #
    # ---------------------------------- Network -----------------------------------
    #
    # Set the bind address to a specific IP (IPv4 or IPv6):
    #
    #network.host: 192.168.0.1
    # ip address, each node is different
    network.host: 192.168.111.10
    #
    # Set a custom port for HTTP:
    #
    #http.port: 9200
    # Set port number
    http.port: 9200
    #
    # For more information, consult the network module documentation.
    #
    # --------------------------------- Discovery ----------------------------------
    #
    # Pass an initial list of hosts to perform discovery when this node is started:
    # The default list of hosts is ["127.0.0.1", "[::1]"]
    # es7. New configuration after X, node discovery
    discovery.seed_hosts: ["192.168.111.10:9300", "192.168.111.11:9300","192.168.111.12:9300"]
    #
    # Bootstrap the cluster using an initial set of master-eligible nodes:
    # es7. The configuration is added after X. This configuration is required to elect a master when initializing a new cluster
    cluster.initial_master_nodes: ["node01"]
    #
    # For more information, consult the discovery and cluster formation module documentation.
    #
    # ---------------------------------- Gateway -----------------------------------
    #
    # Block initial recovery after a full cluster restart until N nodes are started:
    #
    gateway.recover_after_nodes: 2
    network.tcp.keep_alive: true
    network.tcp.no_delay: true
    transport.tcp.compress: true
    #
    # For more information, consult the gateway module documentation.
    #
    # ---------------------------------- Various -----------------------------------
    #
    # Require explicit names when deleting indices:
    #
    #action.destructive_requires_name: true
    
    
    # The head plug-in needs to open these two settings
    http.cors.allow-origin: "*"
    http.cors.enabled: true
    http.max_content_length: 200mb
    
    # The number of data tasks started simultaneously in the cluster is 2 by default
    cluster.routing.allocation.cluster_concurrent_rebalance: 16
    # The number of concurrent recovery threads when adding or deleting nodes and load balancing. The default is 4
    cluster.routing.allocation.node_concurrent_recoveries: 16
    # When initializing data recovery, the number of concurrent recovery threads is 4 by default
    cluster.routing.allocation.node_initial_primaries_recoveries: 16
    
    
    • Then execute xsync to distribute to other machines, and then modify the node name and ip address
  • Modify / etc / security / limits Conf, and then distribute the file

    es               soft    nofile          65536
    es               hard    nofile          65536
    
  • Modify / etc / security / limits d/20-nproc. Conf, and then distribute the file

    # Limit on the number of files that can be opened per process
    es       soft    nofile          65536
    es       hard    nofile          65536
    # Operating system level limit on the number of processes created by each user
    *        hard    nproc           4096
    # *Represents all user names for Linux
    
    
  • Modify / etc / sysctl Conf, and then distribute the file

    # The number of Vmas (virtual memory areas) a process can own. The default value is 65536
    vm.max_map_count=655360
    
  • To reload on each machine, you need to switch the root user

    sysctl -p
    

4.3.3 startup software

  • Start ES on three machines respectively
  • Problems encountered
    • elasticsearch. There is a problem with YML. When configuring the K:V value, there should be a space after the colon
  • Start successful

4.3.4 test cluster

  • Browser access: http://192.168.111.10:9200/_cat/nodes
    • ip according to their own Linux decision

Topics: ElasticSearch