ZooKeeper Learn Notes

Posted by rsassine on Thu, 30 May 2019 18:17:54 +0200

Introduction to ZooKeeper

ZooKeeper(wiki，home，github ) is an open source distributed coordination service for distributed applications.By exposing simple primitives, distributed applications can build higher-level services on top of them, such as synchronization, configuration management, and group membership management.The design is easy to program and the data model uses a well-known file system directory tree structure [doc ].

Consensus and Paxos

Before introducing ZooKeeper, it is important to understand Paxos and Chubby.In 2006, Google published an update on OSDI Bigtable and Chubby Two conference papers, followed by a paper at the 2007 PODC conference. Paxos Made Live "To introduce the consensus on Chubby's underlying implementation ( consensus ) Multi-Paxos protocol, which improves Lamport's original Paxos algorithm and improves its efficiency [ref ].Chubby is used by Google in GFS and Bigtable as a lock service.Influenced by Chubby, Benjamin Reed and Flavio Junqueira from Yahoo Institute developed ZooKeeper, an industry-known open source version of Chubby (internal implementations are actually slightly different) [ref ]), the underlying consensus protocol is ZAB.Lamport's Paxos algorithm is notoriously difficult to understand. How to make the algorithm understandable has become a research topic for Dr. Stanford's Diego Ongaro.Diego Ongaro introduced in 2014 Raft The algorithm paper, "In search of an understandable consensus algorithm".Raft is an understandable version of Paxos and soon became one of the popular protocols for resolving consensus issues.The relationship between these Paxos-like protocols and Paxos systems is as follows [Ailijiang2016 ]:

Google's Hubby has no open source, and Yahoo's open source ZooKeeper has become popular in the industry with cloud computing and big data technology.ZooKeeper's important timelines are as follows:

ZooKeeper 1.0 was released on SourceForge in November 2007 [ref ]
Migration to Apache from SourceForge started in June 2008 [ref ], released in October at Zookeeper 3.0 and became a subproject of Hadoop [ref1 ref2 ]

Flavio Junqueira and Benjamin Reed describe the source of ZooKeeper names ZooKeeper's book There are the following explanations:

ZooKeeper was developed by Yahoo Research.Our group started recommending ZooKeeper to other groups after a while, so we need to give our project a name.At the same time, the team is working on the Hadoop project and has participated in many animal nomenclature projects, including the well-known Apache Pig project. http://pig.apache.org ).When we were discussing a variety of names, one of the team members mentioned that we could no longer use animal names because our supervisors felt that if we went on like this, we would be living in a zoo.It resonates that a distributed system is like a zoo, chaotic and difficult to manage, and ZooKeeper makes it manageable.

Architecture

The ZooKeeper service consists of several servers, one of which is elected as the master server through the ZAB Atomic Broadcasting Protocol and the other as the follower.Clients can connect to any server through the TCP protocol.If the client is a read operation request, either server can respond directly to the request; if it is an update data operation (write data or update data).Update operations can only be coordinated by the master server; if the client connects to a dependent server, the dependent server forwards the request for update data to the master server, which completes the update operation.The master server serializes all update operations, uses the ZAB protocol to notify all dependent servers of data update requests, and ZAB guarantees update operations.

Read and write operations, as illustrated below [Haloi2015 ]:

Any server on ZooKeeper can respond to client reads, which can improve throughput.Chubby differs from ZooKeeper in that all read/write operations are performed by the master server, which simply improves the usability of the entire coordination system by allowing the master server to quickly elect a new master server from the dependent server after a failure.Along with the high throughput benefits, ZooKeeper has the potential problem of reading out-of-date data to clients because the ZAB protocol has not yet broadcast it to dependent servers even though the master server has updated some memory data.To solve this problem, a sync operation is provided in ZooKeeper's interface API function, which can be called before the application reads the data as needed. This means that the slave server that receives the sync command synchronizes state information from the master server to ensure that they are identical.This way, if you invoke the sync operation before the read operation, you can ensure that the client can read the latest state of the data.

data model

The namespace provided by ZooKeeper is very similar to a standard file system.A series of elements in a path are separated by slashes (/).Each node is identified by a path in the ZooKeper namespace.In ZooKeeper terms, a node is called a znode.By default, each znode can only store up to 1M of data (which can be modified by configuration parameters), just like Chubby, to avoid using the coordinating system as a storage system for applications.Znode can only use absolute paths, relative paths cannot be recognized by ZooKeeper.The znode name can be any Unicode character.The only exception is the name'/zookeeper'.A znode named'/zookeeper'is automatically generated by the ZooKeeper system and managed by quota s.

ZooKeeper uses

Installation and Configuration

ZooKeeper installation and startup:

$ brew info zookeeper
zookeeper: stable 3.4.10 (bottled), HEAD
Centralized server for distributed coordination of services
https://zookeeper.apache.org/
... ellipsis
$ brew install zookeeper
$ zkServer start  # start-up
$ zkServer stop   # termination
$ zkServer help
ZooKeeper JMX enabled by default
Using config: /usr/local/etc/zookeeper/zoo.cfg
Usage: ./zkServer.sh {start|start-foreground|stop|restart|status|upgrade|print-cmd}

If you do not modify the configuration file, the default is to start in single-machine mode.To use cluster mode, you need to modify/usr/local/etc/zookeeper/zoo.cfg (the default path).Example zoo.cfg [doc ]:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.211.11:2888:3888
server.2=192.168.211.12:2888:3888
server.3=192.168.211.13:2888:3888

clientPort: The port on which the client connects to the Zookeeper server, which Zookeeper listens for and accepts client access requests.
server.X=YYY:A:B

X: The server number represented;
YYY: Represents the ip address of the server;
A: Represents the communication port between server nodes for communication between follower and leader nodes;
B: Represents the election port. Represents the port where the servers communicate with each other when a new leader is elected. When the leader hangs up, the rest of the servers communicate with each other and a new leader is selected.

If you want to experiment with cluster mode on a single host, you can modify both YYY Y to localhost and make the two ports A:B different from each other (e.g., 2888:3888, 2889:3889, 2890:3890) to implement pseudo cluster mode.The example zoo.cfg is as follows [doc ]:

server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

All commands supported by zkCli:

$ zkCli help
ZooKeeper -server host:port cmd args
    stat path [watch]
    set path data [version]
    ls path [watch]
    delquota [-n|-b] path
    ls2 path [watch]
    setAcl path acl
    setquota -n|-b val path
    history
    redo cmdno
    printwatches on|off
    delete path [version]
    sync path
    listquota path
    rmr path
    get path [watch]
    create [-s] [-e] path data acl
    addauth scheme auth
    quit
    getAcl path
    close
    connect host:port

Node types and their operations

Zookeeper supports two types of nodes: persistent znode and ephemeral znode.Persistent nodes persist regardless of client session, and disappear only when the client explicitly invokes the delete operation.Temporary nodes, on the other hand, are automatically cleaned up by the ZooKeeper system at the end of a client session or when a failure occurs.In addition, both types of nodes can add whether or not they are sequential, which results in persistent and temporary sequential nodes.

(1) persistent znode

Create a node using create (the default persistent node) and view it using get:

$ zkCli  # Start Client
[zk: localhost:2181(CONNECTED) 1] create /zoo 'hello zookeeper'
Created /zoo
[zk: localhost:2181(CONNECTED) 2] get /zoo
hello zookeeper
cZxid = 0x8d
ctime = Thu Nov 08 20:42:55 CST 2017
mZxid = 0x8d
mtime = Thu Nov 08 20:42:55 CST 2017
pZxid = 0x8d
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 15
numChildren = 0

create creates child nodes and uses ls to view all child nodes:

[zk: localhost:2181(CONNECTED) 3] create /zoo/duck ''
Created /zoo/duck
[zk: localhost:2181(CONNECTED) 4] create /zoo/goat ''
Created /zoo/goat
[zk: localhost:2181(CONNECTED) 5] create /zoo/cow ''
Created /zoo/cow
[zk: localhost:2181(CONNECTED) 6] ls /zoo
[cow, goat, duck]

Delete to delete nodes and use rmr to recursively delete:

[zk: localhost:2181(CONNECTED) 7] delete /zoo/duck
[zk: localhost:2181(CONNECTED) 8] ls /zoo
[cow, goat]
[zk: localhost:2181(CONNECTED) 9] delete /zoo
Node not empty: /zoo
[zk: localhost:2181(CONNECTED) 10] rmr /zoo
[zk: localhost:2181(CONNECTED) 11] ls /zoo
Node does not exist: /zoo

(2) temporary node (ephemeral znode)

Unlike persistent nodes, temporary nodes cannot create child nodes:

$ zkCli  # Start First Client
[zk: localhost:2181(CONNECTED) 0] create -e /node 'hello'
Created /node
[zk: localhost:2181(CONNECTED) 40] get /node
hello
cZxid = 0x97
ctime = Thu Nov 08 21:01:25 CST 2017
mZxid = 0x97
mtime = Thu Nov 08 21:01:25 CST 2017
pZxid = 0x97
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x161092a0ff30000
dataLength = 5
numChildren = 0
[zk: localhost:2181(CONNECTED) 1] create /node/child ''
Ephemerals cannot have children: /node/child

Temporary nodes are automatically cleared by the ZooKeeper system at the end of a client session or when a failure occurs.Now under the pilot monitoring for temporary node auto-cleanup:

$ zkCli  # Start 2nd Client
[zk: localhost:2181(CONNECTED) 0] create -e /node 'hello'
Node already exists: /node
[zk: localhost:2181(CONNECTED) 1] stat /node true
cZxid = 0x97
ctime = Thu Nov 08 21:01:25 CST 2017
mZxid = 0x97
mtime = Thu Nov 08 21:01:25 CST 2017
pZxid = 0x97
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x161092a0ff30000
dataLength = 5
numChildren = 0

If Client 1 quit s or crashes, Client 2 receives a monitoring event:

[zk: localhost:2181(CONNECTED) 2]
WATCHER::

WatchedEvent state:SyncConnected type:NodeDeleted path:/node

(3) sequential znode

ZooKeeper automatically appends a sequence number to the znode name when a sequential node is created.Sequential number maintained by parent znode and monotonically increasing.A sequential number consisting of 4 bytes of signed integers and formatted as a 10-digit number filled with 0.

[zk: localhost:2181(CONNECTED) 1] create /test ''
Created /test
[zk: localhost:2181(CONNECTED) 2] create -s /test/seq ''
Created /test/seq0000000000
[zk: localhost:2181(CONNECTED) 3] create -s /test/seq ''
Created /test/seq0000000001
[zk: localhost:2181(CONNECTED) 4] create -s /test/seq ''
Created /test/seq0000000002
[zk: localhost:2181(CONNECTED) 5] ls /test
[seq0000000000, seq0000000001, seq0000000002]

Client API

The main znode operation API s provided by ZooKeeper are as follows:

API operations	describe	CLI Command
create	Create znode	create
delete	Delete znode	delete/rmr/delquota
exists	Check if znode exists	stat
getChildren	Read all child nodes of znode	ls/ls2
getData	Reading znode data	get/listquota
setData	Setting znode data	set/setquota
getACL	ACL to read znode	getACL
setACL	Setting ACL for znode	setACL
sync	synchronization	sync

Java ZooKeeper Class implements the API provided above.

The bottom level of Zookeeper is a Java implementation, and the bottom level of the zkCli command line tool is a Java implementation, with the corresponding Java implementation class being org.apache.zookeeper.ZooKeeperMain [src1 src2 ].Under ZooKeeper 3.5.x, CLI commands correspond to the underlying implementation API:

Name CLI	Java API (ZooKeeper Class)
addauth scheme auth	public void addAuthInfo(String scheme, byte[] auth)
close	public void close()
create [-s] [-e] path data acl	public String create(final String path, byte data[], List<ACL> acl, CreateMode createMode)
delete path [version]	public void delete(String path, int version)
delquota [-n\|-b] path	public void delete(String path, int version)
get path [watch]	public byte[] getData(String path, boolean watch, Stat stat)
getAcl path	public List<ACL> getACL(final String path, Stat stat)
listquota path	public byte[] getData(String path, boolean watch, Stat stat)
ls path [watch]	public List<String> getChildren(String path, Watcher watcher, Stat stat)
ls2 path [watch]	-
quit	public void close()
rmr path	public void delete(final String path, int version)
set path data [version]	public Stat setData(String path, byte[] data, int version)
setAcl path acl	public Stat setACL(final String path, List<ACL> acl, int aclVersion)
setquota -n\|-b val path	public Stat setData(String path, byte[] data, int version)
stat path [watch]	public Stat exists(String path, boolean watch)
sync path	public void sync(String path, AsyncCallback.VoidCallback cb, Object ctx)

watch

ZooKeeper provides an important mechanism for dealing with change: watch.Through a monitoring point, clients can register a notification request for a specified znode node and receive a single notification when a change occurs.When an application registers a monitoring point to receive notifications, the first event matching the monitoring point condition triggers the monitoring point's notification, at most once.For example, when a znode node is also deleted, the client needs to be aware of the change. The client performs exists operations on the / z node and sets the monitoring point flag bits, waits for notification, and the client receives notification as a callback function.

Read operations in ZooKeeper's API: getData, getChildren, and exists, all have the option of setting a monitoring point on the read znode.Using the monitoring point mechanism, we need to implement Watcher An interface class whose only method is process:

void process(WatchedEvent event)

WatchedEvent The data structure includes the following information:

ZooKeeper Session State: Disconnected, SyncConnected, AuthFailed, ConnectedReadOnly, SaslAuthenticated, Expired.
EventType: NodeCreated, NodeDeleted, NodeDataChanged, NodeChildrenChanged, and None.
If the event type is not None, the znode path is also included.

If a WatchedEvent is received, the following results will be output in zkCli:

WatchedEvent state:SyncConnected type:NodeDeleted path:/node

There are two types of monitoring points: data monitoring point and child node monitoring point.Creating, deleting, or setting data for a znode node triggers a data monitoring point, which can be set by two operations, exists and getData.Only getChildren operations can set child node watchpoints, which are triggered only when a znode child node is created or deleted.For each event type, we set the monitoring point by calling:

NodeCreated
Set up a monitoring point through the exists call.
NodeDeleted
Set up monitoring points through exists or getData calls.
NodeDataChanged
Set up monitoring points through exists or getData calls.
NodeChildrenChanged
Set up a monitoring point through a getChildren call.

Java sample code

Using ZooKeeper in Java requires adding the following first maven Dependency:

<dependency>
    <groupId>org.apache.zookeeper</groupId>
    <artifactId>zookeeper</artifactId>
    <version>3.4.11</version>
    <type>pom</type>
</dependency>

The ZookeeperDemo example shows how to set up a connection session and create, read, modify, delete, and set a monitoring point for a znode:

import java.io.IOException;
import org.apache.commons.lang3.time.DateFormatUtils;
import org.apache.zookeeper.*;
import org.apache.zookeeper.data.Stat;

public class ZookeeperDemo {

    public static void main(String[] args) throws KeeperException, InterruptedException, IOException {
        // Create a server connection
        ZooKeeper zk = new ZooKeeper("127.0.0.1:2181", 100, new Watcher() {
            // Monitor all triggered events
            public void process(WatchedEvent event) {
                System.out.printf("WatchedEvent state:%s type:%s path:%s\n", event.getState(), event.getType(), event.getPath());
            }
        });

        // Create Node
        zk.create("/zoo", "hello ZooKeeper".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
        // Read Node Data
        Stat stat = new Stat();
        System.out.println(new String(zk.getData("/zoo", false, stat)));
        printStat(stat);

        // Create Child Node
        zk.create("/zoo/duck", "hello duck".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
        zk.create("/zoo/goat", "hello goat".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
        zk.create("/zoo/cow", "hello cow".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);

        // Read the list of child nodes and set monitoring points
        System.out.println(zk.getChildren("/zoo", true));

        // Read child node data and set monitoring points
        System.out.println(new String(zk.getData("/zoo/duck", true, null)));

        // Modify child node data
        zk.setData("/zoo/duck", "hi duck".getBytes(), -1);
        // Read modified child node data
        System.out.println(new String(zk.getData("/zoo/duck", true, null)));

        // Delete child nodes
        zk.delete("/zoo/duck", -1);
        zk.delete("/zoo/goat", -1);
        zk.delete("/zoo/cow", -1);
        // Delete parent node
        zk.delete("/zoo", -1);

        // Close Connection
        zk.close();
    }

    private static void printStat(Stat stat) {
        System.out.println("cZxid = 0x" + Long.toHexString(stat.getCzxid()));
        System.out.println("ctime = " + DateFormatUtils.format(stat.getCtime(), "yyyy-MM-dd HH:mm:ss"));
        System.out.println("mZxid = 0x" + Long.toHexString(stat.getMzxid()));
        System.out.println("mtime = " + DateFormatUtils.format(stat.getMtime(), "yyyy-MM-dd HH:mm:ss"));
        System.out.println("pZxid = 0x" + Long.toHexString(stat.getPzxid()));
        System.out.println("cversion = " + stat.getCversion());
        System.out.println("dataVersion = " + stat.getVersion());
        System.out.println("aclVersion = " + stat.getAversion());
        System.out.println("ephemeralOwner = 0x" + Long.toHexString(stat.getEphemeralOwner()));
        System.out.println("dataLength = " + stat.getDataLength());
        System.out.println("numChildren = " + stat.getNumChildren());
    }
}

Output results:

WatchedEvent state:SyncConnected type:None path:null
hello ZooKeeper
cZxid = 0x1e1
ctime = 2017-11-20 12:18:36
mZxid = 0x1e1
mtime = 2017-11-20 12:18:36
pZxid = 0x1e1
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 15
numChildren = 0
[cow, goat, duck]
hello duck
WatchedEvent state:SyncConnected type:NodeDataChanged path:/zoo/duck
hi duck
WatchedEvent state:SyncConnected type:NodeDeleted path:/zoo/duck
WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/zoo

ZooInspector

ZooInspector is an official tool for visualizing and editing ZooKeeper instances starting with ZooKeeper 3.3.0 [ZOOKEEPER-678 ].The source code is located in the directory src/contrib/zooinspector with GitHub address: link .You can run it according to the instructions in README.txt.Or you can use the executable jar package provided under ZOOKEEPER-678 directly.

Reference material

Official Document: ZooKeeper http://zookeeper.apache.org/d...
Schilling Wave 2010-11: Zookeeper, Distributed Services Framework https://www.ibm.com/developer...
ZooKeeper: Detailed Distributed Process Collaboration Technology, Benjamin Reed & Flavio Junqueira, 2013, Douban
Apache ZooKeeper Essentials, Haloi 2015，Douban
From Paxos to Zookeeper, Ali Ni Chao 2015, Douban
Daily Record of Big Data: Architecture and Algorithms, Zhang Junlin 2014, Chapter 5 Distributed Coordination System, Douban
2010，Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira, Benjamin Reed: ZooKeeper: Wait-free Coordination for Internet-scale Systems. USENIX ATC 2010，dblp，msa，usenix

Topics: Java Zookeeper Apache Session

Programmer Think