Elasticsearch cluster exaggerates network snapshot migration

Posted by mattd123 on Mon, 24 Jan 2022 14:09:52 +0100

Elasticsearch Version

  • Source: Elasticsearch 7.2.0
  • Target: Elasticsearch: 7.13.2

Cluster structure and server configuration

  • Source: Cluster 3 nodes, corresponding to 3 servers (2 cores 16G/500G/desk)
  • Target: Cluster 3 nodes, corresponding to 3 servers (2 cores 16G/1T/desk)

Cluster data volume and migration time

  • Main slice + 1 copy = 500G, main slice 250G, snapshot backs up only main slice;
  • Ali Cloud uses Ali Cloud Objects to store intranet backups: first full snapshot backup for 30 minutes, followed by incremental backup for about 3 minutes;
  • Google Cloud uses Ali Cloud Object Storage for cross-network recovery: the first full snapshot is restored for 1 hour and 30 minutes, followed by incremental backup for about 5 minutes;

Note: Because Elasticsearch version 7.2.0 repository-s3 is incompatible with Ali Cloud OSS, purchase Ali Cloud Storage Gateway (use NFS mount method to mount OSS)

  • Ali Cloud OSS supports only Virtual hosted style mode access for security reasons.
  • In Elasticsearch versions 7.0, 7.1, 7.2, and 7.3, the repository-s3 plug-in can only access the object repository in Path style mode;
  • Starting with Elasticsearch version 7.4, the repository-s3 plug-in accesses the object repository in Virtual hosted style mode by default, adding the parameter path_when creating the repository Style_ Access: true uses Path style mode.

Snapshot Migration Scheme

Shared File Repository

Dead work

  • Purchase shared disks to store snapshot data (greater than migrated data)
  • Or buy Object Repository + Cloud Storage Gateway, compatible with file sharing

Cluster Each Node Hangs Shared Disk

# New mount directory
mkdir /mnt/esdata

# Mount shared disks
mount.nfs 172.16.0.2:/esdata /mnt/esdata

In elasticsearch. Add Configuration to YML File

# Shared Disk Path
path.repo: /mnt/esdata

# This configuration must be set on each node in the cluster, and each node must be restarted after it is set.

Restart each node of the cluster (data node first, primary node last)

# Disable automatic fragmentation distribution
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}

# Perform synchronous refresh
POST _flush/synced

# Close a Node
ps aux|grep elasticsearch
kill pid

# Start Closed Node
/etc/init.d/elasticsearch start

# Enable automatic fragmentation distribution
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}

# Perform synchronous refresh and wait for all fragments to rejoin the cluster
POST _flush/synced

# Repeat the above actions for each node in the cluster to restart all nodes in the cluster;

# Note: The ES cluster uses load balancing for outbound access, and this restart will not affect online business operations.

Create a snapshot repository and verify that each node is included

# Create a snapshot repository
PUT /_snapshot/Warehouse name
{
  "type": "fs",
  "settings": {
    "location": "/mnt/esdata",                      # Author OSS mount directory
    "max_snapshot_bytes_per_sec": "200mb",          # Adjust snapshot creation speed, default 40mb
    "max_restore_bytes_per_sec": "200mb"            # Adjust the speed of snapshot recovery, default unlimited
  }
}

# Verify snapshot warehouse
POST /_snapshot/Warehouse name/_verify

# View all warehouses
GET _snapshot/_all

# Delete snapshot repository
DELETE _snapshot/Warehouse name

2. Object Repository (repository-s3)

Dead work

  • Purchase object storage to store snapshot data (this migration uses Ali Cloud objects to store OSS, compatible with AWS S3)

1. Source Cluster Snapshot Backup

Cluster Each Node Installs repository-s3 Plugins

# Enter the ES installation directory
cd /elastic/elasticsearch

# Switch to elastic user
su elastic

# Install Plugins
bin/elasticsearch-plugin install repository-s3

# The plug-in must be installed on each node in the cluster, and each node must be restarted after installation.

Restart each node of the cluster (data node first, primary node last)

# Disable automatic fragmentation distribution
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}

# Perform synchronous refresh
POST _flush/synced

# Close a Node
ps aux|grep elasticsearch
kill pid

# Start Closed Node
/etc/init.d/elasticsearch start

# Enable automatic fragmentation distribution
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}

# Perform synchronous refresh and wait for all fragments to rejoin the cluster
POST _flush/synced

# Repeat the above actions for each node in the cluster to restart all nodes in the cluster;

# Note: The ES cluster uses load balancing for outbound access, and this restart will not affect online business operations.

Add repository-s3 key to each node of the cluster

# Switch to elastic user
su elastic

# Add access_key, enter the key after returning
bin/elasticsearch-keystore add s3.client.default.access_key

# Add secret_key, enter the key string after returning
bin/elasticsearch-keystore add s3.client.default.secret_key

# Overloaded Key Configuration
POST _nodes/reload_secure_settings

# View the list of secret keys
bin/elasticsearch-keystore list

Create a snapshot repository and verify that each node is included

# Create a snapshot repository
PUT _snapshot/Warehouse name
{
  "type": "s3",
  "settings": {
    "endpoint": "oss-cn-shenzhen.aliyuncs.com",     # OSS Node
    "bucket": "es",                                 # bucket name
    "base_path": "esdata",                          # Path to snapshot file
    "max_snapshot_bytes_per_sec": "200mb",          # Adjust snapshot creation speed, default 40mb
    "max_restore_bytes_per_sec": "200mb"            # Adjust the speed of snapshot recovery, default unlimited
  }
}

# Verify snapshot warehouse
POST /_snapshot/Warehouse name/_verify

# View all warehouses
GET _snapshot/_all

# Delete snapshot repository
DELETE _snapshot/Warehouse name

Create a snapshot and view the status of the snapshot backup

# Create Snapshot
PUT /_snapshot/Warehouse name/Snapshot Name
{
  "indices": "index_*",             # Table names that need to be backed up, supporting wildcards
  "ignore_unavailable": true,       # Ignore indices missing or closed data streams and indexes
  "include_global_state": true      # Backup global settings, full backup true, incremental backup false
}

# View snapshot status
GET _snapshot/Warehouse name/Snapshot Name/_status
GET _snapshot/Warehouse name/Snapshot Name

# Check all snapshots under warehouse
GET _snapshot/Warehouse name/_all

# Delete snapshot
DELETE _snapshot/Warehouse name/Snapshot Name

# Multiple backups of the Snapshot Name cannot be repeated; Under the same warehouse, the first snapshot is a full backup, and subsequent snapshots are incremental.

2. Target Cluster Snapshot Restore

Cluster Each Node Installs repository-s3 Plugins

# Enter the ES installation directory
cd /elastic/elasticsearch

# Install Plugins
sudo bin/elasticsearch-plugin install repository-s3

# The plug-in must be installed on each node in the cluster, and each node must be restarted after installation.

# Note: All nodes can be restarted directly because the cluster is new and is not running online.

Add repository-s3 key to each node of the cluster

# Add access_key, enter the key after returning
bin/elasticsearch-keystore add s3.client.default.access_key

# Add secret_key, enter the key string after returning
bin/elasticsearch-keystore add s3.client.default.secret_key

# Overloaded Key Configuration
POST _nodes/reload_secure_settings

# View the list of secret keys
bin/elasticsearch-keystore list

Create a snapshot repository and verify that each node is included

# Create snapshot repository [read-only]
PUT _snapshot/Warehouse name
{
  "type": "s3",
  "settings": {
    "endpoint": "oss-cn-shenzhen.aliyuncs.com",     # OSS Node
    "bucket": "es",                                 # bucket name
    "base_path": "esdata",                          # Path to snapshot file
    "max_snapshot_bytes_per_sec": "200mb",          # Adjust snapshot creation speed, default 40mb
    "max_restore_bytes_per_sec": "200mb",           # Adjust the speed of snapshot recovery, default unlimited
    "readonly": true                                # Recovery recommendations are set to read-only to avoid misoperation
  }
}

# Verify snapshot warehouse
POST /_snapshot/Warehouse name/_verify

Adjust cluster recovery fragmentation speed and number of concurrencies

# Modify cluster configuration
PUT _cluster/settings 
{
  "transient": {
    "indices.recovery.max_bytes_per_sec": "200mb",                 # Byte per second limit on recovery
    "cluster.routing.allocation.node_concurrent_recoveries": "3"   # The number of concurrent fragmentation recovery should not be set too large, otherwise it is easy to deadlock
  }
}

# View cluster configurations (including default configurations)
GET _cluster/settings?flat_settings&include_defaults

View all snapshots of the repository and restore snapshot backups in order

# Check all snapshots under warehouse
GET _snapshot/Warehouse name/_all

# Restore snapshot (restore the entire snapshot)
POST /_snapshot/Warehouse name/Snapshot Name/_restore
{
  "include_global_state": true,           # Restore global settings, full backup true, incremental backup false
  "index_settings": {
    "index.number_of_replicas": 0         # Save time by closing copies
  }
}

# Restore snapshot (partial restore snapshot)
POST /_snapshot/Warehouse name/Snapshot Name/_restore
{
  "indices": "index_1,index_2",           # Partial recovery requires that the index does not exist or is automatically renamed to a new index (the original index is unaffected)
  "index_settings": {
    "index.number_of_replicas": 0         # Save time by closing copies
  },
  "rename_pattern": "index_(.+)",
  "rename_replacement": "restored_index_$1",
  "include_aliases": false
}

# Turn off indexes (indexes need to be turned off before incremental snapshot recovery)
POST index_*/_close

# Open Index (automatically or manually after snapshot recovery)
POST index_*/_open

# View recovery status
GET /_cat/recovery?active_only

Increase index copy after all snapshot recovery is complete

PUT index_*/_settings
{
  "index.number_of_replicas": 1
}

Topics: ElasticSearch