ZooKeeper Series: Leader Call Chain

Posted by praxedis on Mon, 09 Sep 2019 08:13:55 +0200

Leader receives client requests by starting Leader Zoo Keeper Server.

First, we look at the definition of its processing chain. From the source code, we can see that the sequence of the processing chain of LeaderZoo Keeper Server is as follows:

 

  • LeaderRequestProcessor: Leader call chain begins
  • PrepRequest Processor: Write action preparation, create Request content
  • ProposalRequest Processor: Write requests are encapsulated in PROPOSAL packages and broadcast to all follower s
  • CommitProcessor: Determines whether a write operation can be submitted for real execution
  • ToBeApplied Processor: Move Request after COMMIT to Final Request Processor and wait for it to be sent.
  • Final Request Processor: The real place to execute Zookeeper commands
  • SyncRequestProcessor: Persisting write operations to disk
  • AckRequest Processor: When proposal is persisted to disk, send ACK packets to the local machine

SyncRequestProcessor and AckRequestProcessor are created within ProposalRequestProcessor. When Leader processes PROPOSAL commands, it first calls SyncRequestProcessor persistence by itself, and then tells Leader directly to process ACK logic (without passing through Quorum Packet) through AckRequestProcessor.

1, LeaderRequestProcessor

Leader Request Processor is the first processor that leader processes requests, so it doesn't need to be forwarded to other machines, it just needs to be executed down. Here is a simple call to the processRequest method of nextProcessor.

The main processing codes are as follows:

public void processRequest(Request request) throws RequestProcessorException {       
    Request upgradeRequest = null;
    try {
        upgradeRequest = lzks.checkUpgradeSession(request);
    } catch (KeeperException ke) {        
        request.setException(ke);
     }
    if (upgradeRequest != null) {
        nextProcessor.processRequest(upgradeRequest);
     }
     nextProcessor.processRequest(request);
}

2, submitLearnerRequest

Leader receives Follower's REQUEST request and calls submitLearnerRequest to handle the write transaction.

This method calls the PrepRequestProcessor's processRequest method to process the Request request on a single statement. The method is defined as follows:

public void submitLearnerRequest(Request request) {      
     prepRequestProcessor.processRequest(request);
}

3. Writing Request

Writing requests are broadcast to the entire cluster for data consistency, so it involves the interaction of multiple clusters.

Here we take the create transaction as an example.

First, the client connects to the Leader, and then the Leader sends a proposal message to all Follower in the cluster with the data of this create transaction.

Follower receives the proposal and first saves it to disk to prevent the loss of the proposal, then replies ACK to Leader.

Leader collects enough ACK and sends COMMIT to all Follower again. At the same time, Leader submits proposal execution locally. Follower also executes proposal transaction locally after receiving COMMIT.

Leader returns the execution results to the client.

Through the above process cluster, all the machines will maintain the same complete database to ensure data consistency.

One thing to note here is that Zk calls the Sync procedure when writing to persist the writing to disk.

The specific process is as follows:

  • 1. First, the client creates a Socket to connect to the Leader and sends a create request to the Leader Zoo Keeper Server.
  • 2. The first Processor in LeaderZoo Keeper Server is triggered to execute the processRequest method.
  • 3. The call chain executes to ProposalRequestProcessor, and the proposal method that triggers the Leader is executed.
  • 4. The proposal method sends PROPOSAL messages to all Follower in the cluster.
  • 5. Follower's processPacket method determines that it is a PROPOSAL message, and calls FollowerZooKeeperServer's logRequest method to record WAL logs.
  • 6. Follwer is followed by nextProcessor, SyncRequestProcessor, whose SendAckRequestProcessor class sends ACK messages to Leader.
  • 7. The Leader receives the ACK message and calls the Leader's processAck method to process the ACK message.
  • 8. Call CommitProcessor processing to determine whether COMMIT can be sent to Follower, if the ACK message of Follower is received, then send COMMIT message to Follower; At the same time, Leader executes to Final Request Processor, the last processing, ready to end the request, and the modification of Znode is completed.
  • 9. Follower receives COMMIT messages, executes FollowerZooKeeperServer's commit method, and eventually FinalRequestProcessor.

This is the complete process that leader handles writing Request requests.

When leader receives the REQUEST package, it calls the submitLearnerRequest method; when it receives the ACK package, it calls the processAck method, and leader receives some of the package code as follows.

while (true) {
     qp = new QuorumPacket();
     ia.readRecord(qp, "packet");
     ByteBuffer bb;
     long sessionId;
     int cxid;
     int type;
     switch (qp.getType()) {
         case Leader.ACK:
            if (this.learnerType == LearnerType.OBSERVER) {                
            }
            syncLimitCheck.updateAck(qp.getZxid());
            leader.processAck(this.sid, qp.getZxid(), sock.getLocalSocketAddress());
            break; 
        case Leader.REQUEST:
            bb = ByteBuffer.wrap(qp.getData());
            sessionId = bb.getLong();
            cxid = bb.getInt();
            type = bb.getInt();
            bb = bb.slice();
            Request si;
            if(type == OpCode.sync){
                si = new LearnerSyncRequest(this, sessionId, cxid, type, bb, qp.getAuthinfo());
            } else {
                si = new Request(null, sessionId, cxid, type, bb, qp.getAuthinfo());
            }
          si.setOwner(this);
            leader.zk.submitLearnerRequest(si);
            break;
       default:
            LOG.warn("unexpected quorum packet, type: {}", packetToString(qp));
            break;
     }
 }

proposal

When LeaderZooKeeperServer receives a client write transaction request, it triggers the Leader's proposal method execution and sends a PROPOSAL message to Follower. At the same time, maintain an outstandingProposals dictionary table to save PROPOSA messages.

The main code of the proposal method is as follows:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
try {
     request.getHdr().serialize(boa, "hdr");
     if (request.getTxn() != null) {
         request.getTxn().serialize(boa, "txn");
     }
     baos.close();
} catch (IOException e) {
     LOG.warn("This really should be impossible", e);
}
QuorumPacket pp = new QuorumPacket(Leader.PROPOSAL, request.zxid,
               baos.toByteArray(), null);
Proposal p = new Proposal();
p.packet = pp;
p.request = request;                
synchronized(this) {
     p.addQuorumVerifier(self.getQuorumVerifier());
    if (request.getHdr().getType() == OpCode.reconfig){
         self.setLastSeenQuorumVerifier(request.qv, true);                       
    }
    lastProposed = p.packet.getZxid();
    outstandingProposals.put(lastProposed, p);
    sendPacket(pp);
}

inform

Similar to proposal, it simply sends write transactions to Observer without requiring Observer to reply to ACK.

public void inform(Proposal proposal) {
    QuorumPacket qp = new QuorumPacket(Leader.INFORM, 
proposal.request.zxid, proposal.packet.getData(), null);
    sendObserverPacket(qp);
}

processAck

Follower persists PROPOSAL to write transaction requests to disk and replies ACK messages to Leader, who receives the package through LearnerHandler and triggers Leader's processAck method.

Leader sends a PROPOSAL message to Follower, waiting for all Followers to reply to the Ack message to determine whether the COMMIT condition is satisfied or not. If the COMMIT condition is satisfied, the Leader sends a COMMIT message to all Follower, and the Leader executes further.

synchronized public void processAck(long sid, long zxid, SocketAddress followerAddr) {        
   if (!allowedToCommit) return;        
   if ((zxid & 0xffffffffL) == 0) {           
       return;
   }   
   if (lastCommitted >= zxid) {
      return;
   }
   Proposal p = outstandingProposals.get(zxid);  
   p.addAck(sid);   
   boolean hasCommitted = tryToCommit(p, zxid, followerAddr);
   if (hasCommitted && p.request!=null && p.request.getHdr().getType() == OpCode.reconfig){
     long curZxid = zxid;
     while (allowedToCommit && hasCommitted && p!=null){
         curZxid++;
         p = outstandingProposals.get(curZxid);
         if (p !=null) 
             hasCommitted = tryToCommit(p, curZxid, null);             
     }
   }
}

ProcessAck counts how many Follower s respond to Quorum Packet packages of Leader.ACK type, and then goes to the tryToCommit method, which determines whether the COMMIT condition is satisfied.

Let's look at what the tryToCommit method does.

tryToCommit

Make sure that write operations are confirmed in sequence.

Attempt to commt Proposal in order. Commit transactions are truly effective transactions.

Transaction confirmation must be done sequentially. All transactions waiting for confirmation are recorded in outstandingProposals. As long as the previous transaction has not been confirmed, the subsequent transactions are prohibited to confirm in order to ensure that the transactions proceed sequentially.

Confirmed transactions are placed in the toBeApplied queue for further processing.

if (outstandingProposals.containsKey(zxid - 1)) 
     return false;
if (!p.hasAllQuorums()) {
     return false;                 
}
outstandingProposals.remove(zxid);
if (p.request != null) {
     toBeApplied.add(p);
}
if (p.request == null) {
     LOG.warn("Going to commmit null: " + p);
} else if (p.request.getHdr().getType() == OpCode.reconfig) {                                   
     Long designatedLeader = getDesignatedLeader(p, zxid);
     QuorumVerifier newQV = p.qvAcksetPairs.get(p.qvAcksetPairs.size()-1).getQuorumVerifier();
     self.processReconfig(newQV, designatedLeader, zk.getZxid(), true);
     if (designatedLeader != self.getId()) {
          allowedToCommit = false;
     }
     commitAndActivate(zxid, designatedLeader);
     informAndActivate(p, designatedLeader);
} else {
     commit(zxid);
     inform(p);
}
zk.commitProcessor.commit(p.request);
if(pendingSyncs.containsKey(zxid)){
     for(LearnerSyncRequest r: pendingSyncs.remove(zxid)) {
          sendSync(r);
     }               
}

commit

Notify Follower to submit a proposal for execution.

Usually, ACK initiates commit and sends Leader.COMMIT packages to Follower only when Leader receives more than half (default algorithm, of course, you can also implement your own judgment logic, such as changing to more than two-thirds of the population).

public void commit(long zxid) {
     synchronized(this){
         lastCommitted = zxid;
     }
     QuorumPacket qp = new QuorumPacket(Leader.COMMIT, zxid, null, null);
     sendPacket(qp);
}

4. Read Request

Read operations do not involve changes in data and state, so there is no need to maintain the consistency of cluster data, and the process is simpler than write operations.

Specific reading process:

1. Request is sent to Leader's first Processor processing, here is Leader Request Processor

2. The LeaderRequestProcessor read operation enters the PrepRequestProcessor

3. PrepRequestProcessor does not perform transaction-related processing on the read operation, and then the processing chain enters ProposalRequestProcessor as well.

4. ProposalRequestProcessor calls the Leader's proposal method, which sends the read operation request directly to nextProcessor.

5. Next Processor arrives at CommitProcessor.

6. CommitProcessor also sends the read request directly to the toBeApplied Processor.

7. To Be Applied Processor sends the read operation request directly to the next Final Request Processor.

8. FinalRequestProcessor executes the final create command locally, calls the process Txn method of ZkDataBase, and completes the final node creation through DataTree.

Topics: Big Data Zookeeper Database socket