Zookeeper's practice and principle analysis of distributed topic-04 distributed coordination service

Posted by jordz on Wed, 19 Feb 2020 15:25:28 +0100

Directory navigation

Preface

We will focus on four aspects of distributed coordination services

  • Preliminary understanding of Zookeeper
  • Understand the core principles of Zookeeper
  • Practice and principle analysis of Zookeeper
  • Zookeeper practice with registry to complete RPC handwriting

In this section, we will talk about the first part: Zookeeper practice and principle analysis

data storage

  • Transaction log

In the zoo.cfg file, specify the file path of datadir

  • snapshot log

File path storage based on datadir

  • Runtime log

bin/zookeeper.out

Using zookeeper based on Java API

First, start the zookeeper cluster. We have talked about it in the previous section, and we will not repeat it here.

Next, I use pom to import the dependency of zookeeper.

    <dependency>
      <groupId>org.apache.zookeeper</groupId>
      <artifactId>zookeeper</artifactId>
      <version>3.4.8</version>
    </dependency>

Of course, you can also use jar package to introduce~

Then we start to establish the connection:

  public static void main(String[] args) {

        try {
        //Pass in the cluster ip: port number of zookeeper
            ZooKeeper zookeeper = new ZooKeeper("192.168.200.111:2181,192.168.200.112:2181,192.168.200.113:2181",4000,null);

            System.out.println(zookeeper.getState());
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println(zookeeper.getState());
        } catch (IOException e) {
            e.printStackTrace();
        }

    }


It can be found that the connection must be changed into connected through thread blocking
So we use JUC's CountDownLatch to make an upgrade

 public static void main(String[] args) {
        try {
            final CountDownLatch countDownLatch=new CountDownLatch(1);
            ZooKeeper zooKeeper=
                    new ZooKeeper("192.168.200.111:2181," +
                            "192.168.200.112:2181,192.168.200.113:2181",
                            4000, new Watcher() {
                        @Override
                        public void process(WatchedEvent event) {
                            if(Event.KeeperState.SyncConnected==event.getState()){
                                //If a response event is received from the server, the connection is successful
                                countDownLatch.countDown();
                            }
                        }
                    });
            countDownLatch.await();
            System.out.println(zooKeeper.getState());//CONNECTED

            //Add node
            zooKeeper.create("/zk-persis-mic","0".getBytes(),ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);
            Thread.sleep(1000);
            Stat stat=new Stat();

            //Get the value of the current node
            byte[] bytes=zooKeeper.getData("/zk-persis-mic",null,stat);
            System.out.println(new String(bytes));

            //Modify node values
            zooKeeper.setData("/zk-persis-mic","1".getBytes(),stat.getVersion());

            //Get the value of the current node
            byte[] bytes1=zooKeeper.getData("/zk-persis-mic",null,stat);
            System.out.println(new String(bytes1));

            zooKeeper.delete("/zk-persis-mic",stat.getVersion());

            zooKeeper.close();

            System.in.read();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (KeeperException e) {
            e.printStackTrace();
        }
    }


Similar to redis, we used the client of zookeeper in the last section. Here, we just used idea to introduce the dependency of zookeeper, docked the api of zookeeper, and realized the operation of establishing connection and CRUD.

TIps:
Learning is to draw inferences from one example. It's better to use all kinds of methods. Here is zookeeper. XXX.jar will be popular again tomorrow. It's also a similar operation~

Event mechanism

Watcher monitoring mechanism is a very important feature of zookeeper. Based on the nodes created on zookeeper, we can bind monitoring events to these nodes. For example, we can monitor events such as node data change, node deletion, child node status change, etc. through this event mechanism, we can realize distributed lock, cluster management and other functions based on zookeeper

Watcher feature: when the data changes, zookeeper will generate a watcher event and send it to the client. But the client receives only one notification. If the subsequent node changes again, the client that previously set the Watcher will not receive the message again. (watcher is a one-time operation). It can achieve permanent monitoring effect through cyclic monitoring

How to register event mechanism

Bind events through these three operations:

  • getData
  • Exists
  • getChildren

How to trigger an event? Any transaction type operation will trigger a listening event. create /delete /setData

public static void main(String[] args) throws IOException, InterruptedException, KeeperException {
        final CountDownLatch countDownLatch=new CountDownLatch(1);
        final ZooKeeper zooKeeper=
                new ZooKeeper("192.168.11.153:2181," +
                        "192.168.11.154:2181,192.168.11.155:2181",
                        4000, new Watcher() {
                    @Override
                    public void process(WatchedEvent event) {
                        System.out.println("Default event: "+event.getType());
                        if(Event.KeeperState.SyncConnected==event.getState()){
                            //If a response event is received from the server, the connection is successful
                            countDownLatch.countDown();
                        }
                    }
                });
        countDownLatch.await();

//Create persistent node
        zooKeeper.create("/zk-persis-mic","1".getBytes(),
                ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);


        //exists  getdata getchildren
        //Binding events through exists
        Stat stat=zooKeeper.exists("/zk-persis-mic", new Watcher() {
            @Override
            public void process(WatchedEvent event) {
                System.out.println(event.getType()+"->"+event.getPath());
                try {
                    //Binding events again
                    zooKeeper.exists(event.getPath(),true);
                } catch (KeeperException e) {
                    e.printStackTrace();
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });
        //Trigger the listening event by modifying the transaction type operation
        stat=zooKeeper.setData("/zk-persis-mic","2".getBytes(),stat.getVersion());

        Thread.sleep(1000);

        zooKeeper.delete("/zk-persis-mic",stat.getVersion());

        System.in.read();
    }

watcher event type

public interface Watcher {
    void process(WatchedEvent var1);

    public interface Event {
         public static enum EventType {
         	//When the client link status changes, it will receive the event of none
            None(-1),
			//Create the event for the node. For example, ZK persis mic
            NodeCreated(1),
            //Event to delete a node
            NodeDeleted(2),
            //Node data changes
            NodeDataChanged(3),
            //Nodes are created, deleted and triggered by events
            NodeChildrenChanged(4);
         }
     }
}     

What kind of operation will produce what kind of event?

~ ZK persis Mic (listening event) ZK persis mic / child (listening event)
create(/zk-persis-mic) NodeCreated(exists getData) nothing
delete(/zk-persis-mic) NodeDeleted(exists getData) nothing
setData(/zk-persis-mic/children) NodeDataChanged(exists getData) nothing
create(/zk-persis-mic/children) NodeChildrenChanged(getchild) nothing
detete(/zk-persis-mic/children) NodeChildrenChanged (getchild) nothing
setData(/zk-persis-mic/children) nothing

Implementation principle of transaction


In depth analysis of the implementation principle of Watcher mechanism

ZooKeeper's Watcher mechanism can be generally divided into three processes:

  • Client registration Watcher
  • Server processing Watcher
  • Client callback Watcher

There are three ways for the client to register the watcher

  1. getData
  2. exists
  3. getChildren

Take the following code as an example to analyze the principle of the whole trigger mechanism

 final ZooKeeper zooKeeper=
                new ZooKeeper("192.168.200.111:2181,192.168.200.112:2181,192.168.200.113:2181",4000, new Watcher() {
                    @Override
                    public void process(WatchedEvent event){
                        System.out.println("Default event: "+event.getType());
                    }
                });

zookeeper.create("/mic","0".getByte(),ZooDefs.Ids. OPEN_ACL_UNSAFE,CreateModel. PERSISTENT); // Create node




zookeeper.exists("/mic",true); //Registered monitoring




zookeeper.setData("/mic", "1".getByte(),-1) ; //Modify the value of a node to trigger listening

Initialization process of ZooKeeper API
When creating a ZooKeeper client object instance, we pass a default watcher to the construction method through new Watcher(). This Watcher will be the default watcher for the entire ZooKeeper session, and will always be saved in the default watcher of the client ZKWatchManager; the code is as follows

    public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher,
            long sessionId, byte[] sessionPasswd, boolean canBeReadOnly,
            HostProvider aHostProvider) throws IOException {
        LOG.info("Initiating client connection, connectString=" + connectString
                + " sessionTimeout=" + sessionTimeout
                + " watcher=" + watcher
                + " sessionId=" + Long.toHexString(sessionId)
                + " sessionPasswd="
                + (sessionPasswd == null ? "<null>" : "<hidden>"));

        this.clientConfig = new ZKClientConfig();
        watchManager = defaultWatchManager();
        watchManager.defaultWatcher = watcher;
       //Set the watcher to ZKWatchManager here 
        ConnectStringParser connectStringParser = new ConnectStringParser(
                connectString);
        hostProvider = aHostProvider;


//Initializes ClientCnxn and calls the cnxn.start() method
        cnxn = new ClientCnxn(connectStringParser.getChrootPath(),
                hostProvider, sessionTimeout, this, watchManager,
                getClientCnxnSocket(), sessionId, sessionPasswd, canBeReadOnly);
        cnxn.seenRwServerBefore = true; // since user has provided sessionId
        cnxn.start();
    }

ClientCnxn: it is the main class for communication and event notification processing between Zookeeper client and Zookeeper server. There are two classes in it

  1. SendThread: responsible for data communication between client and server, including event information transmission

  2. EventThread: mainly used for notification processing in the registered Watchers of client callback

ClientCnxn initialization

    public ClientCnxn(String chrootPath, HostProvider hostProvider, int sessionTimeout, ZooKeeper zooKeeper,
            ClientWatchManager watcher, ClientCnxnSocket clientCnxnSocket,
            long sessionId, byte[] sessionPasswd, boolean canBeReadOnly) {
        this.zooKeeper = zooKeeper;
        this.watcher = watcher;
        this.sessionId = sessionId;
        this.sessionPasswd = sessionPasswd;
        this.sessionTimeout = sessionTimeout;
        this.hostProvider = hostProvider;
        this.chrootPath = chrootPath;

        connectTimeout = sessionTimeout / hostProvider.size();
        readTimeout = sessionTimeout * 2 / 3;
        readOnly = canBeReadOnly;

//Initialize sendThread
        sendThread = new SendThread(clientCnxnSocket);
        //Initialize eventThread
        eventThread = new EventThread();
        this.clientConfig=zooKeeper.getClientConfig();
    }
//Start two threads 
 public void start() {
        sendThread.start();
        eventThread.start();
    }

Clients register to listen through exists

zookeeper.exists("/ mic, true); / / register to listen through the exists method. The code is as follows

   public Stat exists(final String path, Watcher watcher)
        throws KeeperException, InterruptedException
    {
        final String clientPath = path;
        PathUtils.validatePath(clientPath);

        // the watch contains the un-chroot path
        WatchRegistration wcb = null;
        if (watcher != null) {
         // Build ExistWatchRegistration
            wcb = new ExistsWatchRegistration(watcher, clientPath);
        }

        final String serverPath = prependChroot(clientPath);

        RequestHeader h = new RequestHeader();
        // Set the operation type to exists
        h.setType(ZooDefs.OpCode.exists);
        ExistsRequest request = new ExistsRequest();
        // Construct ExistsRequest
        request.setPath(serverPath);
        //Register to listen
        request.setWatch(watcher != null);
        //Set the receiving class of the server response
        SetDataResponse response = new SetDataResponse();
        /Encapsulated RequestHeader,ExistsRequest,SetDataResponse,WatchRegistration Add to send queue
        ReplyHeader r = cnxn.submitRequest(h, request, response, wcb);
        if (r.getErr() != 0) {
            if (r.getErr() == KeeperException.Code.NONODE.intValue()) {
                return null;
            }
            throw KeeperException.create(KeeperException.Code.get(r.getErr()),
                    clientPath);
        }
//Returns the result (Stat information) from exists
        return response.getStat().getCzxid() == -1 ? null : response.getStat();
    }

cnxn.submitRequest

 public ReplyHeader submitRequest(RequestHeader h, Record request,
            Record response, WatchRegistration watchRegistration,
            WatchDeregistration watchDeregistration)
            throws InterruptedException {
        ReplyHeader r = new ReplyHeader();
        
        //Add a message to the queue and construct a Packet transport object
        Packet packet = queuePacket(h, r, request, response, null, null, null,null, watchRegistration, watchDeregistration);
        synchronized (packet) {
            while (!packet.finished) {
             //Blocking until the packet has not been processed
                packet.wait();
            }
        }
        return r;
    }

Call queuePacket

   public Packet queuePacket(RequestHeader h, ReplyHeader r, Record request,
            Record response, AsyncCallback cb, String clientPath,
            String serverPath, Object ctx, WatchRegistration watchRegistration,
            WatchDeregistration watchDeregistration) {
        Packet packet = null;

    //Convert related transport objects to packets
        packet = new Packet(h, r, request, response, watchRegistration);
        packet.cb = cb;
        packet.ctx = ctx;
        packet.clientPath = clientPath;
        packet.serverPath = serverPath;
        packet.watchDeregistration = watchDeregistration;
    
        synchronized (state) {
            if (!state.isAlive() || closing) {
                conLossPacket(packet);
            } else {
                // If the client is asking to close the session then
                // mark as closing
                if (h.getType() == OpCode.closeSession) {
                    closing = true;
                }
                //Add to outgoing queue
                outgoingQueue.add(packet);
            }
        }
        //This is the multiplexing mechanism. Wake up the Selector and tell him that a packet has been added
        sendThread.getClientCnxnSocket().packetAdded();
        return packet;
    }

In ZooKeeper, Packet is the smallest communication protocol unit, that is, Packet. Pakcet is used for network transmission between client and server. Any object to be transmitted needs to be wrapped as a Packet object. In ClientCnxn, the WatchRegistration will also be encapsulated in the pakcet, and then the SendThread thread calls the queuePacket method to put the Packet into the send queue and wait for the client to send. This is another asynchronous process. Asynchronous communication is a very common process in distributed systems

Sending process of SendThread

When initializing the connection, zookeeper initializes two threads and starts. Next, we will analyze the sending process of SendThread. Because it is a thread, the SendThread.run method will be called at startup

        @Override
        public void run() {
            clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
            clientCnxnSocket.updateNow();
            clientCnxnSocket.updateLastSendAndHeard();
            int to;
            long lastPingRwServer = Time.currentElapsedTime();
            final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
            while (state.isAlive()) {
                try {
                    if (!clientCnxnSocket.isConnected()) {
                        // don't re-establish connection if we are closing
                        if (closing) {
                            break;
                        }
                        //Initiate connection

                        startConnect();
                        clientCnxnSocket.updateLastSendAndHeard();
                    }
//In case of connection status, handle authentication authorization of sasl
                    if (state.isConnected()) {
                        // determine whether we need to send an AuthFailed event.
                        if (zooKeeperSaslClient != null) {
                            boolean sendAuthEvent = false;
                            if (zooKeeperSaslClient.getSaslState() == ZooKeeperSaslClient.SaslState.INITIAL) {
                                try {
                                    zooKeeperSaslClient.initialize(ClientCnxn.this);
                                } catch (SaslException e) {
                                   LOG.error("SASL authentication with Zookeeper Quorum member failed: " + e);
                                    state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                }
                            }
                            KeeperState authState = zooKeeperSaslClient.getKeeperState();
                            if (authState != null) {
                                if (authState == KeeperState.AuthFailed) {
                                    // An authentication error occurred during authentication with the Zookeeper Server.
                                    state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                } else {
                                    if (authState == KeeperState.SaslAuthenticated) {
                                        sendAuthEvent = true;
                                    }
                                }
                            }

                            if (sendAuthEvent == true) {
                                eventThread.queueEvent(new WatchedEvent(
                                      Watcher.Event.EventType.None,
                                      authState,null));
                            }
                        }
                        to = readTimeout - clientCnxnSocket.getIdleRecv();
                    } else {
                        to = connectTimeout - clientCnxnSocket.getIdleRecv();
                    }
                    //To, which indicates how much time the client has left before the timeout, and is ready to initiate a ping connection
                    if (to <= 0) {
                    //Indicates that it has timed out
                        String warnInfo;
                        warnInfo = "Client session timed out, have not heard from server in "
                            + clientCnxnSocket.getIdleRecv()
                            + "ms"
                            + " for sessionid 0x"
                            + Long.toHexString(sessionId);
                        LOG.warn(warnInfo);
                        throw new SessionTimeoutException(warnInfo);
                    }
                    if (state.isConnected()) {
                    //Calculate the next ping request time
                        int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() - 
                        		((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
                        //send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
                        if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
                        //Send ping request
                            sendPing();
                            clientCnxnSocket.updateLastSend();
                        } else {
                            if (timeToNextPing < to) {
                                to = timeToNextPing;
                            }
                        }
                    }

                    // If we are in read-only mode, seek for read/write server
                    if (state == States.CONNECTEDREADONLY) {
                        long now = Time.currentElapsedTime();
                        int idlePingRwServer = (int) (now - lastPingRwServer);
                        if (idlePingRwServer >= pingRwTimeout) {
                            lastPingRwServer = now;
                            idlePingRwServer = 0;
                            pingRwTimeout =
                                Math.min(2*pingRwTimeout, maxPingRwTimeout);
                            pingRwServer();
                        }
                        to = Math.min(to, pingRwTimeout - idlePingRwServer);
                    }
//Call clientCnxnSocket to initiate transmission. pendingQueue is a Packet queue used to store sent and waiting responses. clientCnxnSocket defaults to ClientCnxnSocketNIO (ps: remember where to initialize? When instantiating zookeeper)
                    clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
                } catch (Throwable e) {
                    if (closing) {
                        if (LOG.isDebugEnabled()) {
                            // closing so this is expected
                            LOG.debug("An exception was thrown while closing send thread for session 0x"
                                    + Long.toHexString(getSessionId())
                                    + " : " + e.getMessage());
                        }
                        break;
                    } else {
                        // this is ugly, you have a better way speak up
                        if (e instanceof SessionExpiredException) {
                            LOG.info(e.getMessage() + ", closing socket connection");
                        } else if (e instanceof SessionTimeoutException) {
                            LOG.info(e.getMessage() + RETRY_CONN_MSG);
                        } else if (e instanceof EndOfStreamException) {
                            LOG.info(e.getMessage() + RETRY_CONN_MSG);
                        } else if (e instanceof RWServerFoundException) {
                            LOG.info(e.getMessage());
                        } else {
                            LOG.warn(
                                    "Session 0x"
                                            + Long.toHexString(getSessionId())
                                            + " for server "
                                            + clientCnxnSocket.getRemoteSocketAddress()
                                            + ", unexpected error"
                                            + RETRY_CONN_MSG, e);
                        }
                        // At this point, there might still be new packets appended to outgoingQueue.
                        // they will be handled in next connection or cleared up if closed.
                        cleanup();
                        if (state.isAlive()) {
                            eventThread.queueEvent(new WatchedEvent(
                                    Event.EventType.None,
                                    Event.KeeperState.Disconnected,
                                    null));
                        }
                        clientCnxnSocket.updateNow();
                        clientCnxnSocket.updateLastSendAndHeard();
                    }
                }
            }
            synchronized (state) {
                // When it comes to this point, it guarantees that later queued
                // packet to outgoingQueue will be notified of death.
                cleanup();
            }
            clientCnxnSocket.close();
            if (state.isAlive()) {
                eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
                        Event.KeeperState.Disconnected, null));
            }
            ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
                    "SendThread exited loop for session: 0x"
                           + Long.toHexString(getSessionId()));
        }

Network interaction between client and server

In the process of sending, there is a code like this:
clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);

Let's look at the doTransport method:

   @Override
    void doTransport(int waitTimeOut,
                     List<Packet> pendingQueue,
                     ClientCnxn cnxn)
            throws IOException, InterruptedException {
        try {
            if (!firstConnect.await(waitTimeOut, TimeUnit.MILLISECONDS)) {
                return;
            }
            Packet head = null;
            if (needSasl.get()) {
                if (!waitSasl.tryAcquire(waitTimeOut, TimeUnit.MILLISECONDS)) {
                    return;
                }
            } else {
                if ((head = outgoingQueue.poll(waitTimeOut, TimeUnit.MILLISECONDS)) == null) {
                    return;
                }
            }
            // check if being waken up on closing.
            if (!sendThread.getZkState().isAlive()) {
                // adding back the patck to notify of failure in conLossPacket().
                addBack(head);
                return;
            }
            // Abnormal process. The channel is closed. Add the current packet to addBack 
            if (disconnected.get()) {
                addBack(head);
                throw new EndOfStreamException("channel for sessionid 0x"
                        + Long.toHexString(sessionId)
                        + " is lost");
            }
            //If there are currently packets to be sent, the doWrite method is called, and pendingQueue indicates that the packets have been sent and waiting for response

            if (head != null) {
                doWrite(pendingQueue, head, cnxn);
            }
        } finally {
            updateNow();
        }
    }

doWrite method

    private void doWrite(List<Packet> pendingQueue, Packet p, ClientCnxn cnxn) {
        updateNow();
        while (true) {
            if (p != WakeupPacket.getInstance()) {
            //Determine whether the request header and the current request type are not ping or auth operations
                if ((p.requestHeader != null) &&
                        (p.requestHeader.getType() != ZooDefs.OpCode.ping) &&
                        (p.requestHeader.getType() != ZooDefs.OpCode.auth)) {
                        //Set xid, which is used to distinguish request types
                    p.requestHeader.setXid(cnxn.getXid());
                     //Add the current packet to the pendingQueue queue
                    synchronized (pendingQueue) {
                        pendingQueue.add(p);
                    }
                }
                //Send packets out
                sendPkt(p);
            }
            if (outgoingQueue.isEmpty()) {
                break;
            }
            p = outgoingQueue.remove();
        }
    }

sendPkt:

   private void sendPkt(Packet p) {

        //Serialize request data
        p.createBB();
        // Update last send
        updateLastSend();
        //Number of updates sent
        sentCount++;
        // Sending byte cache to server through nio channel
        channel.write(ChannelBuffers.wrappedBuffer(p.bb));
    }

createBB:

       public void createBB() {
            try {
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
                boa.writeInt(-1, "len"); // We'll fill this in later
                 //Serialize header header (requestHeader)
                if (requestHeader != null) {
                    requestHeader.serialize(boa, "header");
                }
                if (request instanceof ConnectRequest) {
                    request.serialize(boa, "connect");
                    // append "am-I-allowed-to-be-readonly" flag
                    boa.writeBool(readOnly, "readOnly");
                } else if (request != null) {
                //Serialize request(request)
                    request.serialize(boa, "request");
                }
                baos.close();
                this.bb = ByteBuffer.wrap(baos.toByteArray());
                this.bb.putInt(this.bb.capacity() - 4);
                this.bb.rewind();
            } catch (IOException e) {
                LOG.warn("Ignoring unexpected exception", e);
            }
        }

From the createBB method, we can see that in the actual network transmission serialization at the bottom layer, zookeeper only talks about two attributes of requestHeader and request, that is, only these two attributes will be serialized to the byte array at the bottom layer for network transmission, and the information related to watchRegistration will not be transmitted on the network.

Tips:
After users call exists to register and listen, they will do several things
1. Package the request data as a packet and add it to the outgoing queue

2.SendThread will perform data sending operation, mainly to send the data in the outgoing queue to the server

3. Through clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this); where ClientCnxnSocket only zookeeper

There are two concrete implementation classes: ClientCnxnSocketNetty and ClientCnxnSocketNIO

Which class is used by body to send is set during initialization when Zookeeper is instantiated. The code is as follows

cnxn = new ClientCnxn(connectStringParser.getChrootPath(), hostProvider, sessionTimeout, this, watchMana getClientCnxnSocket(), canBeReadOnly);

private ClientCnxnSocket getClientCnxnSocket() throws IOException { String clientCnxnSocketName = getClientConfig().getProperty(

ZKClientConfig.ZOOKEEPER_CLIENT_CNXN_SOCKET); if (clientCnxnSocketName == null) {

clientCnxnSocketName = ClientCnxnSocketNIO.class.getName();

}

try {

Constructor<?> clientCxnConstructor = Class.forName(clientCnxnSocketName).getDeclaredConstructor(ZKClient
ClientCnxnSocket clientCxnSocket = (ClientCnxnSocket) clientCxnConstr return clientCxnSocket;

} catch (Exception e) {

IOException ioe = new IOException("Couldn't instantiate "

+ clientCnxnSocketName);

ioe.initCause(e);

throw ioe;

}

}

4. Based on step 3, sendPkt will be executed in ClientCnxnSocketNetty method to send the requested packet to the server

Processing flow of receiving request of server

The server has a NettyServerCnxn class to process the requests sent by the client

   public void receiveMessage(ChannelBuffer message) {
        try {
            while(message.readable() && !throttled) {
            //ByteBuffer is not empty
                if (bb != null) {
                    if (LOG.isTraceEnabled()) {
                        LOG.trace("message readable " + message.readableBytes()
                                + " bb len " + bb.remaining() + " " + bb);
                        ByteBuffer dat = bb.duplicate();
                        dat.flip();
                        LOG.trace(Long.toHexString(sessionId)
                                + " bb 0x"
                                + ChannelBuffers.hexDump(
                                        ChannelBuffers.copiedBuffer(dat)));
                    }
//The remaining space of bb is larger than the size of readable bytes in message 
                    if (bb.remaining() > message.readableBytes()) {
                        int newLimit = bb.position() + message.readableBytes();
                        bb.limit(newLimit);
                    }
                    // Write message to bb
                    message.readBytes(bb);
                    bb.limit(bb.capacity());

                    if (LOG.isTraceEnabled()) {
                        LOG.trace("after readBytes message readable "
                                + message.readableBytes()
                                + " bb len " + bb.remaining() + " " + bb);
                        ByteBuffer dat = bb.duplicate();
                        dat.flip();
                        LOG.trace("after readbytes "
                                + Long.toHexString(sessionId)
                                + " bb 0x"
                                + ChannelBuffers.hexDump(
                                        ChannelBuffers.copiedBuffer(dat)));
                    }
                    // I have finished reading messag
                    if (bb.remaining() == 0) {
                        packetReceived();
                        // Statistics receiving information 
                        bb.flip();

                        ZooKeeperServer zks = this.zkServer;
                        if (zks == null || !zks.isRunning()) {
                            throw new IOException("ZK down");
                        }
                        if (initialized) {
                        //Process the packets from the client
                            zks.processPacket(this, bb);

                            if (zks.shouldThrottle(outstandingCount.incrementAndGet())) {
                                disableRecvNoWait();
                            }
                        } else {
                            LOG.debug("got conn req request from "
                                    + getRemoteSocketAddress());
                            zks.processConnectRequest(this, bb);
                            initialized = true;
                        }
                        bb = null;
                    }
                } else {
                    if (LOG.isTraceEnabled()) {
                        LOG.trace("message readable "
                                + message.readableBytes()
                                + " bblenrem " + bbLen.remaining());
                        ByteBuffer dat = bbLen.duplicate();
                        dat.flip();
                        LOG.trace(Long.toHexString(sessionId)
                                + " bbLen 0x"
                                + ChannelBuffers.hexDump(
                                        ChannelBuffers.copiedBuffer(dat)));
                    }

                    if (message.readableBytes() < bbLen.remaining()) {
                        bbLen.limit(bbLen.position() + message.readableBytes());
                    }
                    message.readBytes(bbLen);
                    bbLen.limit(bbLen.capacity());
                    if (bbLen.remaining() == 0) {
                        bbLen.flip();

                        if (LOG.isTraceEnabled()) {
                            LOG.trace(Long.toHexString(sessionId)
                                    + " bbLen 0x"
                                    + ChannelBuffers.hexDump(
                                            ChannelBuffers.copiedBuffer(bbLen)));
                        }
                        int len = bbLen.getInt();
                        if (LOG.isTraceEnabled()) {
                            LOG.trace(Long.toHexString(sessionId)
                                    + " bbLen len is " + len);
                        }

                        bbLen.clear();
                        if (!initialized) {
                            if (checkFourLetterWord(channel, message, len)) {
                                return;
                            }
                        }
                        if (len < 0 || len > BinaryInputArchive.maxBuffer) {
                            throw new IOException("Len error " + len);
                        }
                        bb = ByteBuffer.allocate(len);
                    }
                }
            }
        } catch(IOException e) {
            LOG.warn("Closing connection to " + getRemoteSocketAddress(), e);
            close();
        }
    }

ZookeeperServer-zks.processPacket(this, bb);

Handle the packets sent by the client

    public void processPacket(ServerCnxn cnxn, ByteBuffer incomingBuffer) throws IOException {
        // We have the request, now process and setup for next
        InputStream bais = new ByteBufferInputStream(incomingBuffer);
        BinaryInputArchive bia = BinaryInputArchive.getArchive(bais);
        RequestHeader h = new RequestHeader();
        h.deserialize(bia, "header");
 //Deserialize client header header 
        incomingBuffer = incomingBuffer.slice();
        //Judge the current operation type
        if (h.getType() == OpCode.auth) {
            LOG.info("got auth packet " + cnxn.getRemoteSocketAddress());
            AuthPacket authPacket = new AuthPacket();
            ByteBufferInputStream.byteBuffer2Record(incomingBuffer, authPacket);
            String scheme = authPacket.getScheme();
            ServerAuthenticationProvider ap = ProviderRegistry.getServerProvider(scheme);
            Code authReturn = KeeperException.Code.AUTHFAILED;
            if(ap != null) {
                try {
                    authReturn = ap.handleAuthentication(new ServerAuthenticationProvider.ServerObjs(this, cnxn), authPacket.getAuth());
                } catch(RuntimeException e) {
                    LOG.warn("Caught runtime exception from AuthenticationProvider: " + scheme + " due to " + e);
                    authReturn = KeeperException.Code.AUTHFAILED;
                }
            }
            if (authReturn == KeeperException.Code.OK) {
                if (LOG.isDebugEnabled()) {
                    LOG.debug("Authentication succeeded for scheme: " + scheme);
                }
                LOG.info("auth success " + cnxn.getRemoteSocketAddress());
                ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
                        KeeperException.Code.OK.intValue());
                cnxn.sendResponse(rh, null, null);
                //If it is not an authorized operation, judge whether it is a sasl operation
            } else {
                if (ap == null) {
                    LOG.warn("No authentication provider for scheme: "
                            + scheme + " has "
                            + ProviderRegistry.listProviders());
                } else {
                {//Finally enter this code block for processing

//Encapsulate request object
                    LOG.warn("Authentication failed for scheme: " + scheme);
                }
              
                ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
                        KeeperException.Code.AUTHFAILED.intValue());
                cnxn.sendResponse(rh, null, null);
           
                cnxn.sendBuffer(ServerCnxnFactory.closeConn);
                cnxn.disableRecv();
            }
            return;
        } else {
            if (h.getType() == OpCode.sasl) {
                Record rsp = processSasl(incomingBuffer,cnxn);
                ReplyHeader rh = new ReplyHeader(h.getXid(), 0, KeeperException.Code.OK.intValue());
                cnxn.sendResponse(rh,rsp, "response"); 
                return;
            }
            else {
                Request si = new Request(cnxn, cnxn.getSessionId(), h.getXid(),
                  h.getType(), incomingBuffer, cnxn.getAuthInfo());
                si.setOwner(ServerCnxn.me);

                setLocalSessionFlag(si);
                submitRequest(si); //Submit request
            }
        }
        cnxn.incrOutstandingRequests(h);
    }

submitRequest

 public void submitRequest(Request si) {
 //Processor processor
        if (firstProcessor == null) {
            synchronized (this) {
                try {
                    // Since all requests are passed to the request
                    // processor it should wait for setting up the request
                    // processor chain. The state will be updated to RUNNING
                    // after the setup.
                    while (state == State.INITIAL) {
                        wait(1000);
                    }
                } catch (InterruptedException e) {
                    LOG.warn("Unexpected interruption", e);
                }
                if (firstProcessor == null || state != State.RUNNING) {
                    throw new RuntimeException("Not started");
                }
            }
        }
        try {
            touch(si.cnxn);
            boolean validpacket = Request.isValid(si.type);
            if (validpacket) {
                firstProcessor.processRequest(si);
                if (si.cnxn != null) {
                    incInProcess();
                }
            } else {
                LOG.warn("Received packet at server of unknown type " + si.type);
                new UnimplementedRequestProcessor().processRequest(si);
            }
        } catch (MissingSessionException e) {
            if (LOG.isDebugEnabled()) {
                LOG.debug("Dropping request: " + e.getMessage());
            }
        } catch (RequestProcessorException e) {
            LOG.error("Unable to process request:" + e.getMessage(), e);
        }
    }

First processor's request chain composition

1. The initialization of the first processor is completed in the setupRequestProcessor of zookeeper server. The code is as follows

protected void setupRequestProcessors() { RequestProcessor finalProcessor = new FinalReques RequestProcessor syncProcessor = new SyncReque ((SyncRequestProcessor)syncProcessor).start(); firstProcessor = new PrepRequestProcessor(this, syn ((PrepRequestProcessor)firstProcessor).start();

}

From the above we can see that the instance of firstProcessor is a PrepRequestProcessor, and a Processor is passed in this constructor to form a call chain.

RequestProcessor syncProcessor = new SyncRequestProcessor(this, finalProcessor);

The construction method of syncProcessor passes another Processor, corresponding to FinalRequestProcessor
2. So the whole call chain is preprequestprocessor - > syncrequestprocessor - > finalrequestprocessor

PredRequestProcessor.processRequest(si);

After learning about the call chain relationship from the above, let's continue to see

firstProcessor.processRequest(si); will call PrepRequestProcessor

public void processRequest(Request request) { submittedRequests.add(request);

}

Alas, it's strange that processRequest just adds request to submitted requests. Based on the previous experience, it's natural to think of another asynchronous operation here. subittedRequests is a blocking queue

LinkedBlockingQueue submittedRequests = new LinkedBlockingQueue();

The PrepRequestProcessor class inherits the thread class, so we can directly find the run method in the current class as follows

public void run() {

try {

while (true) {

Request	request	=
submittedRequests.take(); //ok, get the request from the queue for processing

long	traceMask	=

ZooTrace.CLIENT_REQUEST_TRACE_MASK;

if (request.type == OpCode.ping) {

traceMask	=

ZooTrace.CLIENT_PING_TRACE_MASK;

}

if (LOG.isTraceEnabled()) { ZooTrace.logRequest(LOG,

traceMask, 'P', request, "");

}

if	(Request.requestOfDeath	==

request) {

break;

}

pRequest(request); //Call pRequest

//Pre treatment

}

} catch (RequestProcessorException e) {

if (e.getCause() instanceof XidRolloverException) {
LOG.info(e.getCause().getMessage());

}

handleException(this.getName(), e); } catch (Exception e) {

handleException(this.getName(), e);

}

LOG.info("PrepRequestProcessor	exited

loop!");

}

pRequest

The preprocessing code is too long to paste. The previous N lines of code are judged and processed according to the current OP type. In the last line of this method, we will see the following code

nextProcessor.processRequest(request); obviously, nextProcessor should correspond to SyncRequestProcessor

SyncRequestProcessor. processRequest

public void processRequest(Request request) { // request.addRQRec(">sync");
queuedRequests.add(request);

}

The code of this method is the same. Based on the asynchronous operation, add the request to queuedRequets. Then we will continue to find the run method in the current class

public void run() {

try {

int logCount = 0;




// we do this in an attempt to ensure that not all of the servers

// in the ensemble take a snapshot at the

same time

int randRoll = r.nextInt(snapCount/2); while (true) {

Request si = null;

//Get request from blocking queue

if (toFlush.isEmpty()) {

si = queuedRequests.take(); } else {

si = queuedRequests.poll();
if (si == null) {

flush(toFlush);

continue;

}

}

if (si == requestOfDeath) {

break;

}

if (si != null) {

// track the number of records

written to the log

//The following code, roughly speaking, triggers the snapshot operation and starts a thread processing the snapshot

if

(zks.getZKDatabase().append(si)) { logCount++;

if (logCount > (snapCount /

2 + randRoll)) {

randRoll	=

r.nextInt(snapCount/2);

// roll the log
zks.getZKDatabase().rollLog();

// take a snapshot

if	(snapInProcess	!=

null && snapInProcess.isAlive()) {

LOG.warn("Too

busy to snap, skipping");

} else {

snapInProcess	=

new ZooKeeperThread("Snapshot Thread") {

public

void run() {

try {




zks.takeSnapshot();

}

catch(Exception e) {




LOG.warn("Unexpected exception", e);

}

}

};
snapInProcess.start();

}

logCount = 0;

}

} else if (toFlush.isEmpty()) {

//	optimization	for	read

heavy workloads

// iff this is a read, and there

are no pending

// flushes (writes), then just

pass this to the next

// processor

if (nextProcessor != null) {




nextProcessor.processRequest(si); //Continue to call the next processor to process the request

if	(nextProcessor

instanceof Flushable) {




((Flushable)nextProcessor).flush();

}

}
continue;

}

toFlush.add(si);

if (toFlush.size() > 1000) {

flush(toFlush);

}

}

}

} catch (Throwable t) { handleException(this.getName(), t);

} finally{

running = false;

}

LOG.info("SyncRequestProcessor exited!");

}

FinalRequestProcessor. processRequest

FinalRequestProcessor.processRequest method and update the Session information or znode data in memory according to the operation in the Request object.

There are more than 300 lines of this code, not all of them will be pasted out. We can directly locate the key code and find the following code according to the OP type of the client

case OpCode.exists: {


lastOp = "EXIS";

// TODO we need to figure out the security requirement for this!

ExistsRequest  existsRequest  =  new

ExistsRequest();

//Deserialize (deserialize ByteBuffer to ExitsRequest. This is the Request object that we pass to when the client initiates the Request




ByteBufferInputStream.byteBuffer2Record(request.req uest,

existsRequest);

String	path	=

existsRequest.getPath(); //Get the requested path

if (path.indexOf('\0') != -1) {

throw	new

KeeperException.BadArgumentsException();

}

//Finally, find a key code to determine whether the getWatch of the request exists. If so, pass cnxn
(servercnxn)

//For exists requests, you need to listen for data change events and add a watcher

Stat stat = zks.getZKDatabase().statNode(path, existsRequest.getWatch() ? cnxn : null);

rsp = new ExistsResponse(stat); //In the server-side memory database, assemble according to the result obtained from the path, and set it to ExistsResponse

break;

}

What does statNode do?

public	Stat	statNode(String	path,	ServerCnxn

serverCnxn) throws KeeperException.NoNodeException {

return dataTree.statNode(path, serverCnxn);

}

All the way down, in the following method, ServerCnxn is transformed into Watcher. Because ServerCnxn implements the Watcher interface

public Stat statNode(String path, Watcher watcher)
throws

KeeperException.NoNodeException {

Stat stat = new Stat();

DataNode n = nodes.get(path); //Get the number of nodes

//according to

if (watcher != null) { //If the watcher is not empty, the current watcher and path will be bound

dataWatches.addWatch(path, watcher);

}

if (n == null) {

throw new KeeperException.NoNodeException();

}

synchronized (n) {

n.copyStat(stat);

return stat;

}

}

WatchManager.addWatch(path, watcher);

synchronized void addWatch(String path, Watcher watcher) {
HashSet<Watcher> list = watchTable.get(path); //Judge whether there is a watcher corresponding to the current path in the watcherTable

if (list == null) { //Add actively if it doesn't exist

// don't waste memory if there are few watches on a node

// rehash when the 4th entry is added, doubling size thereafter

// seems like a good compromise

list = new HashSet<Watcher>(4); // Newly generated watcher collection

watchTable.put(path, list);

}

list.add(watcher); //Add to watcher table




HashSet<String> paths = watch2Paths.get(watcher);

if (paths == null) {

// cnxns typically have many watches, so use default cap here

paths = new HashSet<String>(); watch2Paths.put(watcher, paths); // Set up

watcher Mapping to node paths

}

paths.add(path);	// Add a path to the paths collection

}

The general process is as follows

â‘  Obtain the corresponding watcher set from the watchTable through the incoming path (node path), and enter â‘¡

â‘¡ Judge whether the watcher in â‘  is empty. If it is empty, enter â‘¢. Otherwise, enter â‘£

â‘¢ Generate a new watcher set, add the path and the set to the watchTable, and enter â‘£

④ Add the incoming watcher to the watcher set, that is, complete the steps of adding path and watcher to the watchTable, and enter ⑤

⑤ Obtain the corresponding path set from watch2Paths through the incoming watcher, and enter ⑥

⑥ Judge whether the path set is empty. If it is empty, enter ⑦; otherwise, enter ⑧

⑦ New path set is generated, and watcher and paths are added to watch2Paths to enter ⑧

⑧ Adding the incoming path (node path) to the path set completes the steps of adding path and watcher to watch2Paths

The client receives the response processed by the server

ClientCnxnSocketNetty.messageReceived
After the service department is completed, it will pass

NettyServerCnxn.sendResponse sends the returned response information. The client will receive the return from the server at ClientCnxnSocketNetty.messageReceived

 public void messageReceived(ChannelHandlerContext 

ctx,




MessageEvent e) throws Exception { updateNow();

ChannelBuffer buf = (ChannelBuffer) e.getMessage();

while (buf.readable()) {

if (incomingBuffer.remaining() > buf.readableBytes()) {

int newLimit = incomingBuffer.position()

+ buf.readableBytes(); incomingBuffer.limit(newLimit);

}

buf.readBytes(incomingBuffer);




incomingBuffer.limit(incomingBuffer.capacity());




if (!incomingBuffer.hasRemaining()) { incomingBuffer.flip();

if (incomingBuffer == lenBuffer)

{
recvCount++;

readLength();

} else if (!initialized) {

readConnectResult();

lenBuffer.clear();

incomingBuffer = lenBuffer;

initialized = true;

updateLastHeard();

} else {




sendThread.readResponse(incomingBuffer); Triggered when a message is received SendThread.readResponse Method

lenBuffer.clear();

incomingBuffer = lenBuffer;

updateLastHeard();

}

}

}

wakeupCnxn();

}

SendThread. readResponse
The main flow of this method is as follows

First, read the header. If its xid == -2, it indicates a ping response, return

If xid is - 4, it indicates the response return of an AuthPacket

If xid is - 1, it indicates a notification. At this time, continue to read and construct an eNet, send it through EventThread.queueEvent, return

In other cases:

Take out a Packet from pendingQueue, and update the Packet information after verification

void readResponse(ByteBuffer incomingBuffer) throws IOException {

ByteBufferInputStream bbis = new ByteBufferInputStream(

incomingBuffer);

BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);

ReplyHeader replyHdr = new ReplyHeader();
replyHdr.deserialize(bbia, "header"); //Deserialize header

if (replyHdr.getXid() == -2) { //?

// -2 is the xid for pings

if (LOG.isDebugEnabled()) { LOG.debug("Got ping response

for sessionid: 0x"

+

Long.toHexString(sessionId)

+ " after "

+ ((System.nanoTime()

- lastPingSentNs) / 1000000)

+ "ms");

}

return;

}

if (replyHdr.getXid() == -4) {

// -4 is the xid for AuthPacket

if(replyHdr.getErr()	==
KeeperException.Code.AUTHFAILED.intValue()) { state = States.AUTH_FAILED;



eventThread.queueEvent( new WatchedEvent(Watcher.Event.EventType.None,




Watcher.Event.KeeperState.AuthFailed,	null)	);




}

if (LOG.isDebugEnabled()) {

LOG.debug("Got	auth

sessionid:0x"

+

Long.toHexString(sessionId));

}

return;

}

if (replyHdr.getXid() == -1) { //Indicates that the current message type is a notification (meaning a response event of the server)

// -1 means notification

if (LOG.isDebugEnabled()) {

LOG.debug("Got	notification

sessionid:0x"

+
Long.toHexString(sessionId));

}

WatcherEvent event = new WatcherEvent();//?

event.deserialize(bbia, "response"); //Deserialize response information




// convert from a server path to a

client path

if (chrootPath != null) {

String	serverPath	=

event.getPath();




if(serverPath.compareTo(chrootPath)==0)

event.setPath("/");

else	if	(serverPath.length()	>

chrootPath.length())




event.setPath(serverPath.substring(chrootPath.length() ));




else {

LOG.warn("Got server path " +
event.getPath()

+ " which is too short for

chroot path "

+ chrootPath);

}

}




WatchedEvent we = new WatchedEvent(event);

if (LOG.isDebugEnabled()) { LOG.debug("Got " + we + " for

sessionid 0x"

+

Long.toHexString(sessionId));

}




eventThread.queueEvent( we ); return;

}

// If SASL authentication is currently in progress, construct and

// send a response packet immediately, rather than queuing a

// response as with other packets.

if (tunnelAuthInProgress()) {

GetSASLRequest request = new GetSASLRequest();

request.deserialize(bbia,"token");




zooKeeperSaslClient.respondToServer(request.getToke n(),

ClientCnxn.this);

return;

}




Packet packet;

synchronized (pendingQueue) {

if (pendingQueue.size() == 0) {

throw	new

IOException("Nothing in the queue, but got "

+ replyHdr.getXid());

}

packet  =  pendingQueue.remove();
//Because the current packet has received a response, it is removed from pendingQueued

}

/*

*Since requests are processed in order, we better get a response

*to the first request!

*/

try {//Verify the packet information. After the verification is successful, update the packet information (replace with the information of the server)

if (packet.requestHeader.getXid() != replyHdr.getXid()) {

packet.replyHeader.setErr(




KeeperException.Code.CONNECTIONLOSS.intValue()); throw new IOException("Xid out

of order. Got Xid "

+ replyHdr.getXid() + "

with err " +

+ replyHdr.getErr() +

" expected Xid "

+
packet.requestHeader.getXid()

+ " for a packet with

details: "

+ packet );

}







packet.replyHeader.setXid(replyHdr.getXid());




packet.replyHeader.setErr(replyHdr.getErr());




packet.replyHeader.setZxid(replyHdr.getZxid()); if (replyHdr.getZxid() > 0) {

lastZxid = replyHdr.getZxid();

}

if (packet.response != null && replyHdr.getErr() == 0) {




packet.response.deserialize(bbia, "response"); //Get the response from the server and set it to the packet.response property after deserialization. So we can get the return result of the change request through packet.response in the last line of the exists method
}




if (LOG.isDebugEnabled()) {

LOG.debug("Reading	reply

sessionid:0x"

+

Long.toHexString(sessionId) + ", packet:: " + packet);

}

} finally {

finishPacket(packet); // Finally, the finishPacket method is called to complete the processing

}

}

finishPacket method
The main function is to take out the corresponding Watcher from the Packet and register it in ZKWatchManager

private void finishPacket(Packet p) {

int err = p.replyHeader.getErr(); if (p.watchRegistration != null) {

p.watchRegistration.register(err); // Are you familiar with registering events in zkwatchemanager? When assembling the request, we initialize the object

//In the watchRegistration subclass

Watcher Instance to ZKWatchManager Of existsWatches Stored in.

}

//Add all the removed monitoring events to the event queue, so that the client can receive the event type of "data/child event removed"

if (p.watchDeregistration != null) {

Map<EventType, Set<Watcher>> materializedWatchers = null;

try {

materializedWatchers	=

p.watchDeregistration.unregister(err);

for	(Entry<EventType,

Set<Watcher>>	entry	:

materializedWatchers.entrySet()) {

Set<Watcher>	watchers	=

entry.getValue();

if (watchers.size() > 0) {





queueEvent(p.watchDeregistration.getClientPath(), err,
watchers,

entry.getKey());

//	ignore	connectionloss

when removing from local

// session




p.replyHeader.setErr(Code.OK.intValue());

}

}

} catch (KeeperException.NoWatcherException nwe) {




p.replyHeader.setErr(nwe.code().intValue()); } catch (KeeperException ke) {




p.replyHeader.setErr(ke.code().intValue());

}

}

//cb is AsnycCallback. If it is null, it indicates that it is a synchronous calling interface and does not need to be asynchronously dropped. Therefore, notify all directly.

if (p.cb == null) {
synchronized (p) {

p.finished = true;

p.notifyAll();

}

} else {

p.finished = true;

eventThread.queuePacket(p);

}

}

watchRegistration

public void register(int rc) {

if (shouldAddWatch(rc)) {

Map<String, Set<Watcher>> watches = getWatches(rc); // //Get existsWatches in ZKWatchManager through the implementation of subclass

synchronized(watches) { Set<Watcher> watchers =

watches.get(clientPath);

if (watchers == null) {

watchers	=	new

HashSet<Watcher>();
watches.put(clientPath,

watchers);

}

watchers.add(watcher);	//	take

Watcher	Object placement	ZKWatchManager	Medium

existsWatches inside

}

}

}

The following code is the map sets of the client stored watcher, corresponding to three kinds of registered listening events

static class ZKWatchManager implements ClientWatchManager {

private final Map<String, Set<Watcher>> dataWatches =

new HashMap<String, Set<Watcher>>(); private final Map<String, Set<Watcher>>

existWatches =

new HashMap<String, Set<Watcher>>(); private final Map<String, Set<Watcher>>

childWatches =

new HashMap<String, Set<Watcher>>();


Generally speaking, when using the ZooKeeper construction method or using the three interfaces of getData, exists and getChildren to register the Watcher with the ZooKeeper server, first pass the message to the server. After the delivery is successful, the server will notify the client, and then the client will store the path and the corresponding relationship with the Watcher for standby.

EventThread.queuePacket()
The finishPacket method will eventually call eventThread.queuePacket to add the current packet to the queue waiting for event notification

public void queuePacket(Packet packet) { if (wasKilled) {

synchronized (waitingEvents) {

if (isRunning) waitingEvents.add(packet);

else processEvent(packet);

}

} else {

waitingEvents.add(packet);

}

}

Event triggering

The previous long description is just for the purpose of cleaning the event registration process and the final trigger, which needs to be completed through transactional operations

In our initial case, the following code is used to trigger the event

zookeeper.setData("/mic", "1".getByte(),-1) ; //Modify the value of a node to trigger listening

The previous process of client-side and server-side docking is no longer repeated. The interaction process is the same. The only difference is that the event is triggered

Server event response DataTree.setData()

public Stat setData(String path, byte data[], int version, long zxid,

long time) throws KeeperException.NoNodeException {

Stat s = new Stat();

DataNode n = nodes.get(path);
if (n == null) {

throw new KeeperException.NoNodeException();

}

byte lastdata[] = null;

synchronized (n) {

lastdata = n.data;

n.data = data;

n.stat.setMtime(time);

n.stat.setMzxid(zxid);

n.stat.setVersion(version);

n.copyStat(s);

}

// now update if the path is in a quota subtree.

String lastPrefix = getMaxPrefixWithQuota(path);

if(lastPrefix != null) {

this.updateBytes(lastPrefix, (data == null ?

0 : data.length)

-	(lastdata	==	null	?	0	:

lastdata.length));

}
dataWatches.triggerWatch(path, EventType.NodeDataChanged); // Trigger the NodeDataChanged event of the corresponding node

return s;

}

WatcherManager. triggerWatch

Set<Watcher> triggerWatch(String path, EventType type, Set<Watcher> supress) {

WatchedEvent e = new WatchedEvent(type, KeeperState.SyncConnected, path); // Create WatchedEvent according to event type, connection status and node path

HashSet<Watcher> watchers;

synchronized (this) {

watchers = watchTable.remove(path); // Remove the path from the watcher table and return its corresponding watcher set

if (watchers == null || watchers.isEmpty())

{

if (LOG.isTraceEnabled()) {
ZooTrace.logTraceMessage(LOG,




ZooTrace.EVENT_DELIVERY_TRACE_MASK,

"No watchers for " +

path);

}

return null;

}

for  (Watcher  w  :  watchers)  {  //  ergodic

watcher aggregate

HashSet<String> paths = watch2Paths.get(w); // Extract path set from the watcher table according to the watcher

if (paths != null) {

paths.remove(path); //Remove path

}

}

}

for (Watcher w : watchers) { // Traversal watcher

//aggregate

if (supress != null && supress.contains(w))

{
continue;

}

w.process(e); //OK, the point is coming again. What is w.process doing?

}

return watchers;

}

w.process(e);
Remember when we bind events on the server, what is the watcher binding? It is ServerCnxn, so w.process(e), in fact, should call the process method of ServerCnxn. And ServerCnxn is an abstract method, which has two implementation classes: NIOServerCnxn and NettyServerCnxn. Let's take a look at the process method of NettyServerCnxn

public void process(WatchedEvent event) { ReplyHeader h = new ReplyHeader(-1, -1L, 0); if (LOG.isTraceEnabled()) {

ZooTrace.logTraceMessage(LOG, ZooTrace.EVENT_DELIVERY_TRACE_MASK,
"Deliver

event " + event + " to 0x"

+

Long.toHexString(this.sessionId)

+ " through "

+ this);

}




// Convert WatchedEvent to a type that can be sent over the wire

WatcherEvent e = event.getWrapper();




try {

sendResponse(h, e, "notification"); //look, this place sends an event. The event object is WatcherEvent. perfect

} catch (IOException e1) {

if (LOG.isDebugEnabled()) { LOG.debug("Problem sending to " +

getRemoteSocketAddress(), e1);

}

close();

}

}

Then, the client will receive the response and trigger the SendThread.readResponse method

Client processing event response

SendThread.readResponse
This code has been pasted above, so we only select the code of the current process for explanation. According to the previous one, xid of notification message is - 1, which means to directly find - 1 for analysis

void readResponse(ByteBuffer incomingBuffer) throws IOException {

ByteBufferInputStream bbis = new ByteBufferInputStream(

incomingBuffer);

BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);

ReplyHeader replyHdr = new ReplyHeader();
replyHdr.deserialize(bbia, "header"); if (replyHdr.getXid() == -2) { //?

// -2 is the xid for pings

if (LOG.isDebugEnabled()) { LOG.debug("Got ping response

for sessionid: 0x"

+

Long.toHexString(sessionId)

+ " after "

+ ((System.nanoTime()

- lastPingSentNs) / 1000000)

+ "ms");

}

return;

}

if (replyHdr.getXid() == -4) {

// -4 is the xid for AuthPacket

if(replyHdr.getErr()	==

KeeperException.Code.AUTHFAILED.intValue()) { state = States.AUTH_FAILED; eventThread.queueEvent( new

WatchedEvent(Watcher.Event.EventType.None,
Watcher.Event.KeeperState.AuthFailed,	null)	);




}

if (LOG.isDebugEnabled()) {

LOG.debug("Got	auth

sessionid:0x"

+

Long.toHexString(sessionId));

}

return;

}

if (replyHdr.getXid() == -1) {

// -1 means notification

if (LOG.isDebugEnabled()) {

LOG.debug("Got	notification

sessionid:0x"

+

Long.toHexString(sessionId));

}

WatcherEvent	event	=	new

WatcherEvent();
event.deserialize(bbia, "response"); //This place is to deserialize the WatcherEvent event of the server.




// convert from a server path to a

client path

if (chrootPath != null) {

String	serverPath	=

event.getPath();




if(serverPath.compareTo(chrootPath)==0)

event.setPath("/");

else	if	(serverPath.length()	>

chrootPath.length())




event.setPath(serverPath.substring(chrootPath.length() ));




else {

LOG.warn("Got server path " +

event.getPath()

+ " which is too short for

chroot path "

+ chrootPath);
}

}




WatchedEvent	we	=	new

WatchedEvent(event); //Assemble the watchedEvent object. if (LOG.isDebugEnabled()) {

LOG.debug("Got " + we + " for

sessionid 0x"

+

Long.toHexString(sessionId));

}




eventThread.queueEvent( we ); //Event handling through eventTherad

return;

}




// If SASL authentication is currently in progress, construct and

// send a response packet immediately, rather than queuing a

// response as with other packets.
if (tunnelAuthInProgress()) { GetSASLRequest request = new

GetSASLRequest();

request.deserialize(bbia,"token");




zooKeeperSaslClient.respondToServer(request.getToke n(),

ClientCnxn.this);

return;

}




Packet packet;

synchronized (pendingQueue) {

if (pendingQueue.size() == 0) {

throw	new

IOException("Nothing in the queue, but got "

+ replyHdr.getXid());

}

packet = pendingQueue.remove();

}

/*

* Since requests are processed in order,
we better get a response

*to the first request! */

try {

if (packet.requestHeader.getXid() != replyHdr.getXid()) {

packet.replyHeader.setErr(




KeeperException.Code.CONNECTIONLOSS.intValue()); throw new IOException("Xid out

of order. Got Xid "

+ replyHdr.getXid() + "

with err " +

+ replyHdr.getErr() +

" expected Xid "

+

packet.requestHeader.getXid()

+ " for a packet with

details: "

+ packet );

}
packet.replyHeader.setXid(replyHdr.getXid());




packet.replyHeader.setErr(replyHdr.getErr());




packet.replyHeader.setZxid(replyHdr.getZxid()); if (replyHdr.getZxid() > 0) {

lastZxid = replyHdr.getZxid();

}

if (packet.response != null && replyHdr.getErr() == 0) {




packet.response.deserialize(bbia, "response");

}




if (LOG.isDebugEnabled()) {

LOG.debug("Reading	reply

sessionid:0x"

+

Long.toHexString(sessionId) + ", packet:: " + packet);

}

} finally {

eventThread.queueEvent

After SendThread receives the notification event from the server, it will pass the event to the EventThread by calling the queueEvent method of EventThread class. According to the notification event, the queueEvent method will take out all relevant watchers from ZKWatchManager. If it gets the corresponding Watcher, it will cause the Watcher to be removed and invalid.

private void queueEvent(WatchedEvent event, Set<Watcher> materializedWatchers) {

if (event.getType() == EventType.None && sessionState == event.getState()) { //Judgement type

return;

}

sessionState = event.getState(); final Set<Watcher> watchers;

if (materializedWatchers == null) {

// materialize the watchers based on

the event

watchers
watcher.materialize(event.getState(),

event.getType(),

event.getPath());

} else {

watchers = new HashSet<Watcher>();




watchers.addAll(materializedWatchers);

}

//Encapsulate the WatcherSetEventPair object and add it to the waitingevents queue

WatcherSetEventPair pair = new WatcherSetEventPair(watchers, event);

// queue the pair (watch set & event) for later processing

waitingEvents.add(pair);

}

Meterialize method

Get the corresponding watch through the remove of dataWatches, existWatches or childWatches, indicating that the client watch is also removed once registered
At the same time, we need to return the Watcher set that should be notified according to keeperState, eventType and path

public Set<Watcher> materialize(Watcher.Event.KeeperState state,




Watcher.Event.EventType type,

String

clientPath)

{

Set<Watcher> result = new HashSet<Watcher>();




switch (type) {

case None:

result.add(defaultWatcher);

boolean	clear	=

disableAutoWatchReset && state != Watcher.Event.KeeperState.SyncConnected;

synchronized(dataWatches) {

for(Set<Watcher>	ws:

dataWatches.values()) {

result.addAll(ws);

}

if (clear) {

dataWatches.clear();

}

}




synchronized(existWatches) {

for(Set<Watcher>	ws:

existWatches.values()) {

result.addAll(ws);

}

if (clear) {

existWatches.clear();

}

}




synchronized(childWatches) {

for(Set<Watcher>	ws:

childWatches.values()) {

result.addAll(ws);

}
if (clear) {

childWatches.clear();

}

}




return result;

case NodeDataChanged:

case NodeCreated:

synchronized (dataWatches) {




addTo(dataWatches.remove(clientPath), result);

}

synchronized (existWatches) {




addTo(existWatches.remove(clientPath), result);

}

break;

case NodeChildrenChanged:

synchronized (childWatches) {




addTo(childWatches.remove(clientPath), result);

}
break;

case NodeDeleted:

synchronized (dataWatches) {




addTo(dataWatches.remove(clientPath), result);

}

// XXX This shouldn't be needed, but

just in case

synchronized (existWatches) {

Set<Watcher>	list	=

existWatches.remove(clientPath);

if (list != null) {




addTo(existWatches.remove(clientPath), result);

LOG.warn("We are triggering an exists watch for delete! Shouldn't happen!");

}

}

synchronized (childWatches) {





addTo(childWatches.remove(clientPath), result);
}

break;

default:

String msg = "Unhandled watch event type " + type

+ " with state " + state + " on

path " + clientPath; LOG.error(msg);

throw new RuntimeException(msg);

}




return result;

}

}

waitingEvents.add
The last step is to get close to the truth

waitingEvents is the blocking queue in the thread EventThread. Obviously, it is also a thread instantiated in the first step of our operation. According to the name, waitingEvents is a queue of watchers to be processed. The run() method of EventThread will continuously fetch data from the queue and submit it to the processEvent method for processing:

public void run() {

try {

isRunning = true;

while (true) { //Dead cycle

Object event = waitingEvents.take(); //Get events from the pending event queue

if (event == eventOfDeath) {

wasKilled = true;

} else {

processEvent(event); //Execution event

//Handle

}

if (wasKilled)

synchronized (waitingEvents) { if (waitingEvents.isEmpty()) {

isRunning = false;

break;

}

}

}

} catch (InterruptedException e) { LOG.error("Event thread exiting due to
interruption", e);

}




LOG.info("EventThread	shut	down	for

session: 0x{}",




Long.toHexString(getSessionId()));

}

ProcessEvent
Because this code is too long, I only paste out the core code, which is the core code for handling event triggering

private void processEvent(Object event) { try {

if (event instanceof WatcherSetEventPair) { //Judge event type

// each watcher will process the

event

WatcherSetEventPair	pair	=

(WatcherSetEventPair) event; // Get watcherseteventPair

for	(Watcher	watcher	:
pair.watchers) { //Get all the watcher columns that match the trigger mechanism

//Table, loop to call

try {




watcher.process(pair.event); // Call the callback process of the client

} catch (Throwable t) {

LOG.error("Error	while

calling watcher ", t);

}

}

}

Epilogue

No time

57 original articles published, praised 5, 4966 visitors
Private letter follow

Topics: Zookeeper Session network snapshot