Zookeeper source code analysis - Analysis of the whole process of Leader node processing transaction requests

Posted by deemurphy on Fri, 05 Nov 2021 04:22:45 +0100

preface:

After the above analysis of the Leader node's processing of non transaction requests, this paper will take a look at the real play: the processing process of transaction requests.

Of course, there are still those processors about the Leader: preprequestprocessor - > proposalrequestprocessor - > commitprocessor - > tobeapplied requestprocessor - > finalrequestprocessor. Let's analyze the process directly.

There are many transaction requests. The author chooses a typical one: create request. Other types of requests are similar and will not be repeated.

1.PrepRequestProcessor

public class PrepRequestProcessor extends ZooKeeperCriticalThread implements RequestProcessor {
    protected void pRequest(Request request) throws RequestProcessorException {
        request.hdr = null;
        request.txn = null;
        
        try {
            switch (request.type) {
                case OpCode.create:
                CreateRequest createRequest = new CreateRequest();
                pRequest2Txn(request.type, zks.getNextZxid(), request, createRequest, true);
                break;
                ...    
            }
        }
        // Finally, it is handed over to the next processor
        request.zxid = zks.getZxid();
        nextProcessor.processRequest(request);
    }
    
    // The specific treatment is here
    protected void pRequest2Txn(int type, long zxid, Request request, Record record, boolean deserialize)
        throws KeeperException, IOException, RequestProcessorException
    {
        request.hdr = new TxnHeader(request.sessionId, request.cxid, zxid,
                                    Time.currentWallTime(), type);

        switch (type) {
            case OpCode.create:                
                zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
                CreateRequest createRequest = (CreateRequest)record;   
                if(deserialize)
                    // Deserialize the request body of the client into the CreateRequest object
                    ByteBufferInputStream.byteBuffer2Record(request.request, createRequest);
                // path check
                String path = createRequest.getPath();
                int lastSlash = path.lastIndexOf('/');
                if (lastSlash == -1 || path.indexOf('\0') != -1 || failCreate) {
                    LOG.info("Invalid path " + path + " with session 0x" +
                            Long.toHexString(request.sessionId));
                    throw new KeeperException.BadArgumentsException(path);
                }
                // ACL permission check
                List<ACL> listACL = removeDuplicates(createRequest.getAcl());
                if (!fixupACL(request.authInfo, listACL)) {
                    throw new KeeperException.InvalidACLException(path);
                }
                String parentPath = path.substring(0, lastSlash);
                ChangeRecord parentRecord = getRecordForPath(parentPath);

                checkACL(zks, parentRecord.acl, ZooDefs.Perms.CREATE,
                        request.authInfo);
                int parentCVersion = parentRecord.stat.getCversion();
                // Reset the path information according to the node type created
                CreateMode createMode =
                    CreateMode.fromFlag(createRequest.getFlags());
                if (createMode.isSequential()) {
                    path = path + String.format(Locale.ENGLISH, "%010d", parentCVersion);
                }
                validatePath(path, request.sessionId);
                try {
                    if (getRecordForPath(path) != null) {
                        throw new KeeperException.NodeExistsException(path);
                    }
                } catch (KeeperException.NoNodeException e) {
                    // ignore this one
                }
                // Check whether the parent node is a temporary node
                boolean ephemeralParent = parentRecord.stat.getEphemeralOwner() != 0;
                if (ephemeralParent) {
                    throw new KeeperException.NoChildrenForEphemeralsException(path);
                }
                int newCversion = parentRecord.stat.getCversion()+1;
                
                // Supplement the txn object information of the request, which will be used by the subsequent requestProcessor
                request.txn = new CreateTxn(path, createRequest.getData(),
                        listACL,
                        createMode.isEphemeral(), newCversion);
                StatPersisted s = new StatPersisted();
                if (createMode.isEphemeral()) {
                    s.setEphemeralOwner(request.sessionId);
                }
                // Modify the stat information of the parent node
                parentRecord = parentRecord.duplicate(request.hdr.getZxid());
                parentRecord.childCount++;
                parentRecord.stat.setCversion(newCversion);
                addChangeRecord(parentRecord);
                addChangeRecord(new ChangeRecord(request.hdr.getZxid(), path, s,
                        0, listACL));
                break;
        }
        ...
        
}

The processing here is no different from the node processing of the previous analysis of the stand-alone version. It is mainly to verify the permission ACL and path, and then submit it to the ProposalRequestProcessor   handle

2.ProposalRequestProcessor 

public class ProposalRequestProcessor implements RequestProcessor {
 
    public void processRequest(Request request) throws RequestProcessorException {
        // If the request comes from leaner
        if(request instanceof LearnerSyncRequest){
            zks.getLeader().processSync((LearnerSyncRequest)request);
        } else {
            	// Both transactional and non transactional requests will flow the request to the next processor (CommitProcessor),
                nextProcessor.processRequest(request);
            // For transaction requests (the transaction request header is not empty), transaction voting and other actions are also required, which is different from the previous non transaction requests
            if (request.hdr != null) {
                try {
                    // Initiate a proposal for a transaction request. See 2.1 for details
                    zks.getLeader().propose(request);
                } catch (XidRolloverException e) {
                    throw new RequestProcessorException(e.getMessage(), e);
                }
                // Record this transaction request in the transaction log. SyncProcessor has analyzed it before, so I won't repeat it here
                syncProcessor.processRequest(request);
            }
        }
    }
}

2.1 the leader initiates a proposal for a transaction request

public class Leader {
 
    public Proposal propose(Request request) throws XidRolloverException {
        // You can pay attention to this bug
        if ((request.zxid & 0xffffffffL) == 0xffffffffL) {
            String msg =
                    "zxid lower 32 bits have rolled over, forcing re-election, and therefore new epoch start";
            shutdown(msg);
            throw new XidRolloverException(msg);
        }
        byte[] data = SerializeUtils.serializeRequest(request);
        proposalStats.setLastProposalSize(data.length);
        // Encapsulates a packet of propsal type
        QuorumPacket pp = new QuorumPacket(Leader.PROPOSAL, request.zxid, data, null);
        
        Proposal p = new Proposal();
        p.packet = pp;
        p.request = request;
        synchronized (this) {
            if (LOG.isDebugEnabled()) {
                LOG.debug("Proposing:: " + request);
            }

            lastProposed = p.packet.getZxid();
            outstandingProposals.put(lastProposed, p);
            // Finally, the proposal package is sent to followers
            sendPacket(pp);
        }
        return p;
    }
    
    // Send proposal to all follower s
    void sendPacket(QuorumPacket qp) {
        synchronized (forwardingFollowers) {
            for (LearnerHandler f : forwardingFollowers) { 
                // Finally, it is handed over to each LearnerHandler
                f.queuePacket(qp);
            }
        }
    }
}

Summary: the ProposalRequestProcessor handles the request. On the one hand, it sends the request to the next processor (CommitProcessor) for processing. On the other hand, it packages the request as a proposal and sends it to all followers, waiting for the followers to return ack after processing;

2.2 leaders send proposal s to followers

The Leader wraps the request as a proposal and finally sends it to the LearnerHandler. Sending is just normal sending. Let's take a look at the logic related to receiving the response (ack) from the follower

public class LearnerHandler extends ZooKeeperThread {
    @Override
    public void run() {
     	...
        while (true) {
            qp = new QuorumPacket();
            ia.readRecord(qp, "packet");
         	ByteBuffer bb;
            long sessionId;
            int cxid;
            int type;

            // Response received
            switch (qp.getType()) {
                // ACK type, indicating that the follower has completed recording the transaction log of the request    
                case Leader.ACK:
                    if (this.learnerType == LearnerType.OBSERVER) {
                        if (LOG.isDebugEnabled()) {
                            LOG.debug("Received ACK from Observer  " + this.sid);
                        }
                    }
                    syncLimitCheck.updateAck(qp.getZxid());
                    // The leader calculates whether there are enough follower s to return ack
                    leader.processAck(this.sid, qp.getZxid(), sock.getLocalSocketAddress());
                    break;   
                    ...
            }
        }
    }
}

2.3 the leader collects the votes of the follower on this proposal

public class Leader {
	synchronized public void processAck(long sid, long zxid, SocketAddress followerAddr) {
        ...
        Proposal p = outstandingProposals.get(zxid);
        if (p == null) {
            LOG.warn("Trying to commit future proposal: zxid 0x{} from {}",
                    Long.toHexString(zxid), followerAddr);
            return;
        }
        
        // When the sid of the follower of the current response ack is added to the ackSet of the Proposal
        p.ackSet.add(sid);
       
        // Have enough follower s returned ack
        if (self.getQuorumVerifier().containsQuorum(p.ackSet)){             
            if (zxid != lastCommitted+1) {
                LOG.warn("Commiting zxid 0x{} from {} not first!",
                        Long.toHexString(zxid), followerAddr);
                LOG.warn("First is 0x{}", Long.toHexString(lastCommitted + 1));
            }
            outstandingProposals.remove(zxid);
            // This proposal has been approved by most follower s and can be commit ted
            // Add to toBeApplied first
            if (p.request != null) {
                toBeApplied.add(p);
            }

            if (p.request == null) {
                LOG.warn("Going to commmit null request for proposal: {}", p);
            }
            // The leader sends a commit command to all follower s to submit this proposal
            commit(zxid);
            inform(p);
            
            // Add this request to the CommitProcessor.committedRequests collection
            zk.commitProcessor.commit(p.request);
            if(pendingSyncs.containsKey(zxid)){
                for(LearnerSyncRequest r: pendingSyncs.remove(zxid)) {
                    sendSync(r);
                }
            }
        }
    }
}

Summary: the whole proposal voting process is mainly divided into the following steps:

1) The leader initiates voting for the transaction request, generates a proposal, and sends it to all follower s

2) The follower receives the proposal and returns the ack to the leader after processing

3) The leader collects all acks. If most follower s have returned acks, it is judged that the request is passed and can be submitted

4) The leader sends a commit request to all follower s, who submit the proposal

3.CommitProcessor 

Since the main thing is to let the ProposalRequestProcessor   Yes, what else does CommitProcessor do?

So far, the leader only records the transaction request to the transaction log, but it has not been added to the current ZKDatabase. When will it be added? Finally, it will be added by FinalRequestProcessor. When will it be added? This is handled by CommitProcessor, and its main function is here.

public class CommitProcessor extends ZooKeeperCriticalThread implements RequestProcessor {
 
    // Request collection obtained by leader
    LinkedList<Request> queuedRequests = new LinkedList<Request>();
    // The collection of requests that have been submitted by follower
    LinkedList<Request> committedRequests = new LinkedList<Request>();
    
    public void run() {
        try {
            Request nextPending = null;            
            while (!finished) {
                int len = toProcess.size();
                for (int i = 0; i < len; i++) {
                    // 5. The request proposal has been completed and can be handled by the next processor
                    nextProcessor.processRequest(toProcess.get(i));
                }
                toProcess.clear();
                synchronized (this) {
                    // 2. If not enough follower ack s are received, wait
                    if ((queuedRequests.size() == 0 || nextPending != null)
                            && committedRequests.size() == 0) {
                        wait();
                        continue;
                    }
                    // 3.committedRequests is not empty, indicating that enough follower ack s have been received and the follower has commit ted this request
                    if ((queuedRequests.size() == 0 || nextPending != null)
                            && committedRequests.size() > 0) {
                        Request r = committedRequests.remove();
                        if (nextPending != null
                                && nextPending.sessionId == r.sessionId
                                && nextPending.cxid == r.cxid) {
                            nextPending.hdr = r.hdr;
                            nextPending.txn = r.txn;
                            nextPending.zxid = r.zxid;
                            // 4. For the leader, this request can be submitted to the next processor for processing
                            toProcess.add(nextPending);
                            nextPending = null;
                        } else {
                            // this request came from someone else so just
                            // send the commit packet
                            toProcess.add(r);
                        }
                    }
                }

                // We haven't matched the pending requests, so go back to
                // waiting
                if (nextPending != null) {
                    continue;
                }

                // 1. When the request is reached, nextPending is set to the current request and will be used in the next cycle
                synchronized (this) {
                    // Process the next requests in the queuedRequests
                    while (nextPending == null && queuedRequests.size() > 0) {
                        Request request = queuedRequests.remove();
                        switch (request.type) {
                        case OpCode.create:
                        case OpCode.delete:
                        case OpCode.setData:
                        case OpCode.multi:
                        case OpCode.setACL:
                        case OpCode.createSession:
                        case OpCode.closeSession:
                            nextPending = request;
                            break;
                        case OpCode.sync:
                            if (matchSyncs) {
                                nextPending = request;
                            } else {
                                toProcess.add(request);
                            }
                            break;
                        default:
                            toProcess.add(request);
                        }
                    }
                }
            }
        } catch (InterruptedException e) {
            LOG.warn("Interrupted exception while waiting", e);
        } catch (Throwable e) {
            LOG.error("Unexpected exception causing CommitProcessor to exit", e);
        }
        LOG.info("CommitProcessor exited loop!");
    }
}

Readers can look at the code in the numerical order marked in the method, so that the whole process is smooth.

4.ToBeAppliedRequestProcessor 

static class ToBeAppliedRequestProcessor implements RequestProcessor {
    private RequestProcessor next;
    private ConcurrentLinkedQueue<Proposal> toBeApplied;

    public void processRequest(Request request) throws RequestProcessorException {
        // request.addRQRec(">tobe");
        next.processRequest(request);
        Proposal p = toBeApplied.peek();
        if (p != null && p.request != null
            && p.request.zxid == request.zxid) {
            toBeApplied.remove();
        }
    }
}

The code is very simple. It seems that ToBeAppliedRequestProcessor intercepted a lonely, basically did nothing, and directly handed it over to the last processor

5.FinalRequestProcessor

public class FinalRequestProcessor implements RequestProcessor {
 
    public void processRequest(Request request) {
     	ProcessTxnResult rc = null;
        synchronized (zks.outstandingChanges) {
            ...
            if (request.hdr != null) {
               TxnHeader hdr = request.hdr;
               Record txn = request.txn;

                // Create this node and add it to ZKDatabase
               rc = zks.processTxn(hdr, txn);
            }
            // After the above is completed, put the transaction request into the committedProposal queue
            if (Request.isQuorum(request.type)) {
                zks.getZKDatabase().addCommittedProposal(request);
            }
        }
        switch (request.type) {
            // For the create request, return the CreateResponse response
            case OpCode.create: {
                lastOp = "CREA";
                rsp = new CreateResponse(rc.path);
                err = Code.get(rc.err);
                break;
            }    
        }
    }
}

Therefore, the final node creation is still completed by the FinalRequestProcessor without much analysis, which is similar to the previous stand-alone version.

Summary:

The process of the leader node processing a transaction request is quite complex. The main process is the process of proposing voting for the transaction request and collecting the voting response (ack), which is different from the non transaction request.

Let's use a diagram from Paxos to Zookeeper distributed consistency principle and practice to summarize the whole process:

 

Topics: Zookeeper