From 0 to 1, hand in hand to teach you how to get to etcd

Posted by alco19357 on Mon, 29 Jun 2020 12:25:04 +0200

By kaliarch

Background: Recently, there are some puzzles about the function of etcd in k8s application. If we study it alone, we can better understand some characteristics of k8s.

1, Overview

1.1 introduction to etcd

Etcd is an open source project launched by CoreOS team in June 2013. Its goal is to build a highly available distributed key value database. Within etcd, raft protocol is used as the consistency algorithm, and etcd is implemented based on Go language.

1.2 development history

1.3 features of etcd

  • Simple: the installation and configuration are simple, and the HTTP API is provided for interaction, and the use is also very simple
  • Security: support SSL certificate verification
  • Fast: according to the official benchmark data, a single instance supports 2k + read operations per second
  • Reliability: raft algorithm is used to realize the availability and consistency of distributed system data

1.4 Concept Terms

  • Raft: the algorithm adopted by etcd to ensure strong consistency of distributed system.
  • Node: a Raft state machine instance.
  • Member: an etcd instance. It manages a Node and can serve client requests.
  • Cluster: etcd cluster composed of multiple members that can work together.
  • Peer: the name of another Member in the same etcd cluster.
  • Client: the client that sends HTTP requests to the etcd cluster.
  • WAL: write ahead log, etcd is used for the log format of persistent storage.
  • Snapshot: etcd is a snapshot set to prevent too many WAL files and store etcd data status.
  • Proxy: a mode of etcd that provides reverse proxy services for etcd clusters.
  • Leader: the node generated by the campaign in the Raft algorithm to process all data submissions.
  • Follower: as the subordinate nodes in Raft, the nodes that failed in the election provide strong consistency guarantee for the algorithm.
  • Candidate: when the Follower fails to receive the Leader's heartbeat for a certain period of time, it turns to candidate and starts the campaign.
  • Term: when a node becomes a Leader until the next election time, it is called a term.
  • Index: data item number. Data is located in Raft through Term and index.

1.5 data reading and writing order

In order to ensure the strong consistency of data, all data flows in the etcd cluster are in the same direction, from the Leader (master node) to the Follower, that is, all the data of the Follower must be consistent with the Leader, and if inconsistent, it will be covered.

Users read and write to all nodes of etcd cluster

  • Read: since the data of all nodes in the cluster is strongly consistent, the data can be read from any node in the cluster
  • Write: the etcd cluster has a Leader. If you write to the Leader, you can write directly. Then the Leader node will distribute the write to all followers. If you write to the follower, the Leader node will distribute the write to all followers

1.6 leader election

Assuming a cluster of three nodes, timers are running on all three nodes (the duration of each Timer is random). Raft algorithm uses random timers to initialize the Leader election process. The first node completes the Timer first, and then it sends it to the other two nodes as the Leader After receiving the request, other nodes will respond by voting, and then the first node is elected as the Leader.

After becoming a Leader, the node will send notifications to other nodes at regular intervals to ensure that it is still a Leader. In some cases, when followers do not receive the Leader's notice, such as the Leader node goes down or loses connection, other nodes will repeat the previous election process to elect a new Leader.

1.7 judge whether the data is written or not

Etcd believes that after the write request is processed by the Leader node and distributed to most nodes, it is a successful write. How to determine the number of nodes? Assuming that the number of summary points is N, then the majority of nodes Quorum=N/2+1. As for how to determine how many nodes an etcd cluster should have, the chart on the left of the above figure shows the number of Quorum corresponding to the total number of nodes (Instances) in the cluster. Subtracting Quorum from Instances is the number of fault-tolerant nodes (nodes allowed to fail) in the cluster.

Therefore, the minimum number of nodes recommended in the cluster is 3, because the number of fault-tolerant nodes of 1 and 2 nodes is 0, once one node goes down, the whole cluster will not work properly.

2, etcd architecture and analysis

2.1 architecture

2.2 architecture analysis

From the architecture diagram of etcd, we can see that etcd is mainly divided into four parts.

  • HTTP Server: used to process API requests sent by users and synchronization and heartbeat information requests of other etcd nodes.
  • Store: it is used to handle the transaction of various functions supported by etcd, including data index, node status change, monitoring and feedback, event processing and execution, etc., which is the specific implementation of most API functions provided by etcd to users.
  • Raft: the implementation of raft strong consistency algorithm is the core of etcd.
  • WAL: Write Ahead Log is the data storage mode of etcd. In addition to storing the state of all data and the index of nodes in memory, etcd is stored persistently through WAL. In WAL, all data will be recorded in advance before submission.
  • Snapshot is a state snapshot to prevent too much data;
  • Entry represents the specific log content stored.

Usually, a user's request will be forwarded to the Store via HTTP Server for specific transaction processing. If node modification is involved, it will be sent to Raft module for status change and log recording, and then synchronized to other etcd nodes to confirm data delivery. Finally, the data is submitted and synchronized again.

3, Application scenarios

3.1 service registration and discovery

etcd can be used for service registration and discovery

  • Front and back end business registration discovery

The middle price has been registered with the back-end service in etcd. The front-end and middle price can easily find the relevant servers from etcd, and then the servers can bind and call according to the calling relationship

  • Multi group backend server registration discovery

Multiple applications with the same stateless copy in the back end can be registered with etcd by colleagues. The front-end can obtain the ip and port group of the back-end from etcd through haproxy, and then forward the request. It can be used to shield back-end ports from multiple groups of app instances in the back-end by means of fail over.

3.2 message publishing and subscription

Etcd can act as message middleware. Producers can register topics in etcd and send messages. Consumers subscribe to topics from etcd to obtain messages sent by producers to etcd.

3.3 load balancing

Multiple groups of the same service providers in the back-end can register with etcd through their own services. The etcd will monitor and check the registered services. The service request first obtains the real ip:port of the available service providers from etcd, and then sends requests to the multiple groups of services. The etcd acts as a load balancing function

3.4 division notification and coordination

  • When the etcd watch service is found missing, it will notify the service to check
  • The controller sends start service to etcd, and etcd notifies service to operate accordingly
  • When the service is completed, the work will update the status to etcd, and the etcd will notify the user accordingly

3.5 distributed lock

When there are more than one competitor node, etcd, as the master controller, successfully allocates a lock with one node in the distributed cluster

3.6 distributed queue

For each node, etcd creates a queue corresponding to each node. According to different queues, the corresponding competitor can be found in etcd

3.7 cluster and monitoring and Leader election

etcd can select the leader from multiple node s according to raft algorithm.

4, Installation and deployment

4.1 single machine deployment

You can use binary or source code to download and install, but you need to write your own configuration file for harm. How to start, you need to write your own service startup file. yum installation is recommended

hostnamectl set-hostname etcd-1
rpm -ivh epel-release-latest-7.noarch.rpm
# The version of etcd in the yum warehouse is 3.3.11. If you need the latest version of etcd, you can install binary
yum -y install etcd
systemctl enable etcd

You can view the valid configuration file of etcd installed by yum, modify the data storage directory according to your own needs, and the name of the port url/etcd that has been monitored

  • By default, etcd stores the data to the default.etcd/ Directory
  • In http://localhost:2380 communicating with other nodes in the cluster
  • In http://localhost:2379 provides HTTP API service for client interaction
  • The name of the node defaults to default
  • heartbeat is 100ms. The function of this configuration will be explained later
  • The s election is 1000ms. The function of this configuration will be explained later
  • The snapshot count is 10000. The function of this configuration will be explained later
  • The cluster and each node will generate a uuid
  • When it is started, it will run raft and select the leader
[root@VM_0_8_centos tmp]# grep -Ev "^#|^$" /etc/etcd/etcd.conf
[root@VM_0_8_centos tmp]# systemctl status etcd

4.2 cluster deployment

Cluster deployment is best to deploy odd bit, which can achieve the best cluster fault tolerance

4.2.1 host information

4.2.2 host configuration

In this example, three nodes are used to deploy the etcd cluster, and each node modifies hosts

cat >> /etc/hosts << EOF etcd-0-8 etcd-0-14 etcd-0-17

4.2.3 etcd installation

etcd is installed on all three nodes

rpm -ivh epel-release-latest-7.noarch.rpm
yum -y install etcd
systemctl enable etcd
mkdir -p /data/app/etcd/
chown etcd:etcd /data/app/etcd/

4.2.4 etcd configuration

  • etcd default configuration file (omitted)

etcd-0-8 configuration:

[root@etcd-server ~]# hostnamectl set-hostname etcd-0-8
[root@etcd-0-8 ~]# egrep "^#|^$" /etc/etcd/etcd.conf -v

etcd-0-14 configuration:

[root@etcd-server ~]# hostnamectl set-hostname etcd-0-14
[root@etcd-server ~]# mkdir -p /data/app/etcd/
[root@etcd-0.14 ~]# egrep "^#|^$" /etc/etcd/etcd.conf -v

etcd-0-7 configuration:

[root@etcd-server ~]# hostnamectl set-hostname etcd-0-17
[root@etcd-server ~]# mkdir -p /data/app/etcd/
[root@etcd-0-17 ~]# egrep "^#|^$" /etc/etcd/etcd.conf -v

Start the service after configuration

systemctl start etcd

4.2.5 view cluster status

  • View etcd status
[root@etcd-0-8 default.etcd]# systemctl status etcd
● etcd.service - Etcd Server   
Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)   
Active: active (running) since Two 2019-12-03 15:55:28 CST; 8s ago 
Main PID: 24510 (etcd)   
CGroup: /system.slice/etcd.service           
                └─24510 /usr/bin/etcd --name=etcd-0-8 --data-dir=/data/app/etcd/ --listen-client-urls=
12 month 03 15:55:28 etcd-0-8 etcd[24510]: set the initial cluster version to 3.0
12 month 03 15:55:28 etcd-0-8 etcd[24510]: enabled capabilities for version 3.0
12 month 03 15:55:30 etcd-0-8 etcd[24510]: peer 56e0b6dad4c53d42 became active
12 month 03 15:55:30 etcd-0-8 etcd[24510]: established a TCP streaming connection with peer 56e0b6dad4c53d42 (stream Message reader)
12 month 03 15:55:30 etcd-0-8 etcd[24510]: established a TCP streaming connection with peer 56e0b6dad4c53d42 (stream Message writer)
12 month 03 15:55:30 etcd-0-8 etcd[24510]: established a TCP streaming connection with peer 56e0b6dad4c53d42 (stream MsgApp v2 reader)
12 month 03 15:55:30 etcd-0-8 etcd[24510]: established a TCP streaming connection with peer 56e0b6dad4c53d42 (stream MsgApp v2 writer)
12 month 03 15:55:32 etcd-0-8 etcd[24510]: updating the cluster version from 3.0 to 3.3
12 month 03 15:55:32 etcd-0-8 etcd[24510]: updated the cluster version from 3.0 to 3.3
12 month 03 15:55:32 etcd-0-8 etcd[24510]: enabled capabilities for version 3.3

Check the port listening (if the loopback address is not monitored locally, the local etcdctl cannot be connected normally)

[root@etcd-0-8 default.etcd]# netstat -lntup |grep etcd
tcp 0      0*     LISTEN 25167/etcd
tcp 0      0*     LISTEN 25167/etcd
tcp 0      0*    LISTEN 25167/etcd

View the cluster status (you can see etcd-0-17)

[root@etcd-0-8 default.etcd]# etcdctl member list
2d2e457c6a1a76cb: name=etcd-0-8 peerURLs= clientURLs=, isLeader=false
56e0b6dad4c53d42: name=etcd-0-14 peerURLs= clientURLs=, isLeader=true
d2d2e9fc758e6790: name=etcd-0-17 peerURLs= clientURLs=, isLeader=false

[root@etcd-0-8 ~]# etcdctl cluster-health
member 2d2e457c6a1a76cb is healthy: got healthy result from
member 56e0b6dad4c53d42 is healthy: got healthy result from
member d2d2e9fc758e6790 is healthy: got healthy result from
cluster is healthy

5, Easy to use

5.1 increase

  • set

Specifies the value of a key. For example:

$ etcdctl set /testdir/testkey "Hello world"
Hello world

#Supported options include:
--ttl '0' Timeout for this key value(The unit is seconds),No configuration(0 by default)Never time out
--swap-with-value value If the current value of the key is value,Then set it
--swap-with-index '0'   If the current index value of the key is the specified index, set it
  • mk

If the given key does not exist, a new key value is created. For example:

$ etcdctl mk /testdir/testkey "Hello world"
Hello world

#When the command is executed, for example:
$ etcdctl mk /testdir/testkey "Hello world"
Error: 105: Key already exists (/testdir/testkey) [8]

#The supported options are:
--ttl '0' Time out(The unit is seconds)(The default is 0). Never time out
  • mkdir

If the given key directory does not exist, a new key directory is created. For example:

$ etcdctl mkdir testdir2

#The supported options are:
--ttl '0' Time out(The unit is seconds),No configuration(0 by default)Never time out.
  • setdir

Create a key directory. If the directory does not exist, it is created. If the directory exists, update the directory TTL.

$ etcdctl setdir testdir3

#The supported options are:
--ttl '0' Time out(The unit is seconds),Do not configure(The default is 0)Never time out.

5.2 deletion

  • rm

Delete a key value. For example:

$ etcdctl rm /testdir/testkeyPrevNode.Value: Hello

#When the key does not exist, an error is reported. For example:
$ etcdctl rm /testdir/testkey
Error: 100: Key not found (/testdir/testkey) [7]

#The supported options are:
--dir Delete if the key is an empty directory or key value pair
--recursive Delete directory and all subkeys
--with-value Check that existing values match
--with-index '0'Check existing index Match or not
  • rmdir

Delete an empty directory or key value pair.

$ etcdctl setdir dir1
$ etcdctl rmdir dir1
#If the directory is not empty, an error will be reported:
$ etcdctl set /dir/testkey hihi
$ etcdctl rmdir /dir
Error: 108: Directory not empty (/dir) [17]

5.3 update

  • update

Update the value content when the key exists. For example:

$ etcdctl update /testdir/testkey "Hello"

#When the key does not exist, an error is reported. For example:
$ etcdctl update /testdir/testkey2 "Hello"
Error: 100: Key not found (/testdir/testkey2) [6]

#The supported options are:
--ttl '0' Time out(The unit is seconds),No configuration(The default is 0)Never time out.
  • updatedir

Update an existing directory.

$ etcdctl updatedir testdir2

#The supported options are:
--ttl '0' Time out(The unit is seconds),No configuration(The default is 0)Never time out.

5.4 query

  • get

Gets the value of the specified key. For example:

$ etcdctl get /testdir/testkey
Hello world

#When the key does not exist, an error is reported. For example:
$ etcdctl get /testdir/testkey2
Error: 100: Key not found (/testdir/testkey2) [5]

#The supported options are:
--sort Sort the results
--consistent Send the request to the primary node to ensure the consistency of the obtained content.
  • ls

Lists the keys or subdirectories under the directory (the default is the root directory). The contents in the subdirectories are not displayed by default.

For example:

$ etcdctl ls/testdir/testdir2/dir
$ etcdctl ls dir/dir/testkey

#Supported options include:
--sort Sort output
--recursive If there are subdirectories in the directory, the contents will be output recursively-p For output as directory, add at the end/Make a distinction

5.5 watch

  • watch

Monitor the change of a key value. Once the key value is updated, the latest value will be output and exit.
For example: the user updates the testkey key value to Hello watch.

$ etcdctl get /testdir/testkey
Hello world
$ etcdctl set /testdir/testkey "Hello watch"
Hello watch
$ etcdctl watch testdir/testkey
Hello watch

The options supported for copying code include:

--forever monitor until the user presses CTRL+C to exit
 --After index '0' is monitored until the index is specified
 --recursive returns all key values and subkeys
  • exec-watch

When a given key value changes, the key value is updated.
For example, the user updates the testkey value.

$ etcdctl exec-watch testdir/testkey -- sh -c 'ls'
config Documentation etcd etcdctl

Supported options include:

--After index '0' is monitored until the index is specified
 --recursive returns all key values and subkeys

5.6 backup

Backup etcd data.

$ etcdctl backup --data-dir /var/lib/etcd --backup-dir /home/etcd_backup

Supported options include:

--Data directory of data dir etcd
 --Backup dir backup to the specified path

5.7 member

List, add, and delete etcd instances to the etcd cluster through the list, add, and remove commands.

View the nodes that exist in the cluster

$ etcdctl member list
8e9e05c52164694d: name=dev-master-01 peerURLs=http://localhost:2380 clientURLs=http://localhost:2379 isLeader=true

Delete the existing nodes in the cluster

$ etcdctl member remove 8e9e05c52164694d
Removed member 8e9e05c52164694d from cluster

Add new nodes to the cluster

$ etcdctl member add etcd3
Added member named etcd3 with ID 8e9e05c52164694d to cluster


# Set a key value
[root@etcd-0-8 ~]# etcdctl set /msg "hello k8s"
hello k8s

# Get the value of key
[root@etcd-0-8 ~]# etcdctl get /msg
hello k8s

# Get the details of the key value
[root@etcd-0-8 ~]# etcdctl -o extended get /msg
Key: /msg
Created-Index: 12
Modified-Index: 12
TTL: 0
Index: 12

hello k8s

# Get nonexistent key and report error
[root@etcd-0-8 ~]# etcdctl get /xxzx
Error: 100: Key not found (/xxzx) [12]

# Set ttl of key, which will be automatically deleted after expiration
[root@etcd-0-8 ~]# etcdctl set /testkey "tmp key test" --ttl 5
tmp key test
[root@etcd-0-8 ~]# etcdctl get /testkey
Error: 100: Key not found (/testkey) [14]

# key replacement operation
[root@etcd-0-8 ~]# etcdctl get /msg
hello k8s
[root@etcd-0-8 ~]# etcdctl set --swap-with-value "hello k8s" /msg "goodbye"
[root@etcd-0-8 ~]# etcdctl get /msg

# mk created only when the key does not exist (set will overwrite the same key)
[root@etcd-0-8 ~]# etcdctl get /msg
[root@etcd-0-8 ~]# etcdctl mk /msg "mktest"
Error: 105: Key already exists (/msg) [18]
[root@etcd-0-8 ~]# etcdctl mk /msg1 "mktest"

# Create a self sorted key
[root@etcd-0-8 ~]# etcdctl mk --in-order /queue s1s1
[root@etcd-0-8 ~]# etcdctl mk --in-order /queue s2s2
[root@etcd-0-8 ~]# etcdctl ls --sort
[root@etcd-0-8 ~]# etcdctl get /queue/00000000000000000021

# Update key value
[root@etcd-0-8 ~]# etcdctl update /msg1 "update test"
update test
[root@etcd-0-8 ~]# etcdctl get /msg1
update test

# Update ttl and value of key
[root@etcd-0-8 ~]# etcdctl update --ttl 5 /msg "aaa"

# Create directory
[root@etcd-0-8 ~]# etcdctl mkdir /testdir

# remove empty directories
[root@etcd-0-8 ~]# etcdctl mkdir /test1
[root@etcd-0-8 ~]# etcdctl rmdir /test1

# Delete non empty directory
[root@etcd-0-8 ~]# etcdctl get /testdir/test
dir: is a directory
[root@etcd-0-8 ~]#
[root@etcd-0-8 ~]# etcdctl rm --recursive /testdir

# List the contents of the table of contents
[root@etcd-0-8 ~]# etcdctl ls /
[root@etcd-0-8 ~]# etcdctl ls /tmp

# Recursively lists the contents of a directory
[root@etcd-0-8 ~]# etcdctl ls --recursive /

# Monitor the key and print out the change when the key changes
[root@etcd-0-8 ~]# etcdctl watch /msg1
[root@VM_0_17_centos ~]# etcdctl update /msg1 "xxx"

# Listen to a directory and print it when any node in the directory changes
[root@etcd-0-8 ~]# etcdctl watch --recursive 
/[update] /msg1
[root@VM_0_17_centos ~]# etcdctl update /msg1 "xxx"

# Listen all the time, unless' CTL + C 'causes the listener to quit
[root@etcd-0-8 ~]# etcdctl watch --forever /

# Listen to the directory and execute a command when it changes
[root@etcd-0-8 ~]# etcdctl exec-watch --recursive / -- sh -c "echo change"

# backup
[root@etcd-0-14 ~]# etcdctl backup --data-dir /data/app/etcd --backup-dir /root/etcd_backup
2019-12-04 10:25:16.113237 I | ignoring EntryConfChange raft entry2019-12-04 10:25:16.113268 I | ignoring EntryConfChange raft entry
2019-12-04 10:25:16.113272 I | ignoring EntryConfChange raft entry2019-12-04 10:25:16.113293 I | ignoring member attribute update on /0/members/2d2e457c6a1a76cb/attributes
2019-12-04 10:25:16.113299 I | ignoring member attribute update on /0/members/d2d2e9fc758e6790/attributes
2019-12-04 10:25:16.113305 I | ignoring member attribute update on /0/members/56e0b6dad4c53d42/attributes
2019-12-04 10:25:16.113310 I | ignoring member attribute update on /0/members/56e0b6dad4c53d42/attributes
2019-12-04 10:25:16.113314 I | ignoring member attribute update on /0/members/2d2e457c6a1a76cb/attributes
2019-12-04 10:25:16.113319 I | ignoring member attribute update on /0/members/d2d2e9fc758e6790/attributes
2019-12-04 10:25:16.113384 I | ignoring member attribute update on /0/members/56e0b6dad4c53d42/attributes

# Use v3 version
[root@etcd-0-14 ~]# export ETCDCTL_API=3
[root@etcd-0-14 ~]# etcdctl --endpoints=",," snapshot save mysnapshot.db
Snapshot saved at mysnapshot.db
[root@etcd-0-14 ~]# etcdctl snapshot status mysnapshot.db -w json

6, Summary

  • Etcd only saves 1000 historical events by default, so it is not suitable for scenarios with a large number of update operations, which will lead to data loss. The typical application scenarios of etcd are configuration management and service discovery, which are read more and write less.
  • Compared with zookeeper, etcd is much simpler to use. However, to realize the real service discovery function, etcd needs to be used together with other tools (such as register, confd, etc.) to realize the automatic registration and update of services.
  • At present, etcd has no graphical tools.

If there are errors or other problems, please comment and correct. If you have any help, please click like + forward to share.

Welcome to the official account of the brother of migrant workers: the road of brother technology.

Topics: Linux snapshot Attribute EPEL RPM