ZFKC principle and source code analysis

Posted by bob2006 on Fri, 08 Oct 2021 11:02:51 +0200

principle

summary

NameNode active / standby switchover is mainly realized by three components: ZKFailoverController, HealthMonitor and ActiveStandbyElector.
HealthMonitor is responsible for monitoring the health of NN, starting a thread to send rpc request, confirming the NN status according to the response, and notifying zkfc through callback function once the status changes
ActiveStandbyElector is mainly responsible for coordinating with ZK and listening to ZK cluster nodes for election

There are two paths on ZK cluster for Hadoop HA handover
First, the / Hadoop HA / {DFS. Nameservices} / activestandbyelectorlockactive path is used to create temporary nodes, that is, locks
Second, the / hadoop − ha/{dfs.nameservices}/ActiveBreadCrumb path is used to create a permanent node and store the address information of the ANN
Under normal circumstances, permanent nodes will be deleted when temporary nodes are deleted

When the HM of the ANN monitors that the NN state is abnormal, it notifies zkfc through the callback function, and zkfc calls the ASE method to exit the election, that is, delete the zk node; Or if the entire zkfc service is unavailable, the zk cluster does not send heartbeat to the zk cluster for a long time, and the zk cluster deletes the ha node.
Other SNNS monitor that zk cluster Active nodes are deleted through ASE. If they are in a healthy state, they will create temporary nodes under the ActiveStandbyElectorLockactive path, that is, snatch locks. When the SNN is created successfully, it will detect whether a node exists in the breadcrumb path. If so, try to delete the node by calling the transitionToStandby method. If not, it will use ssh to log in to the kill process or start a shell script to fence the ANN.
If it cannot be deleted, it will give up the lock, exit the election and wait for a period of time (in order to enable other SNN S to snatch the lock). If the deletion is successful, it will call the becomeActive method, and the bottom layer will call the transitionToActive method to turn the NN into an ANN.

technological process

1. After the initialization of HealthMonitor is completed, the internal thread will be started to regularly call the methods of the HAServiceProtocol RPC interface of the corresponding NameNode to detect the health status of the NameNode.

2. If the HealthMonitor detects that the health status of the NameNode has changed, it will call back the corresponding method registered by ZKFailoverController for processing.

3. If ZKFailoverController judges that active / standby switching is required, it will first use ActiveStandbyElector to conduct automatic active / standby election.

4.ActiveStandbyElector interacts with Zookeeper to complete automatic active and standby election.

5. After the active standby election, activestandbyelector will call back the corresponding method of ZKFailoverController to notify the current NameNode to become the primary NameNode or standby NameNode

6.ZKFailoverController calls the method of the HAServiceProtocol RPC interface of the corresponding NameNode to convert the NameNode to Active state or Standby state.

HealthMonitor

HealthMonitor is used to monitor the health status (HealthMonitor.State) and service status (HAServiceStatus) of the local NN through RPC. When the status information changes, it sends information to ZKFC through callback.

//Five states of HealthMonitor
/**
 * The health monitor is still starting up.
 */
INITIALIZING,

/**
 * The service is not responding to health check RPCs.
 */
SERVICE_NOT_RESPONDING,

/**
 * The service is connected and healthy.
 */
SERVICE_HEALTHY,

/**
 * The service is running but unhealthy.
 */
SERVICE_UNHEALTHY,

/**
 * The health monitor itself failed unrecoverably and can
 * no longer provide accurate information.
 */
HEALTH_MONITOR_FAILED;

//HAServiceStatus has four statuses
INITIALIZING("initializing"),
ACTIVE("active"),
STANDBY("standby"),
STOPPING("stopping");

ActiveStandbyElector

ActiveStandbyElector mainly controls and monitors the status of nodes on ZK and interacts with ZKFC. How to call joinElection, ASE will try to create a node on ZK (obtain a lock). If the node is successfully created, call becomeActive to become an ANN. If it fails, call becameStandby to become an SNN to continue monitoring the health status of NN and register a watcher to monitor the active lock.

/**
 * To participate in election, the app will call joinElection. The result will
 * be notified by a callback on either the becomeActive or becomeStandby app
 * interfaces.
 */
public synchronized void joinElection(byte[] data)

/**
 * Any service instance can drop out of the election by calling quitElection. 
 * <br/>
 */
public synchronized void quitElection(boolean needFence)

ZKFC

ZKFC initializes HealthMonitor and ActiveStandbyElector when it is created. ZKFC coordinates HealthMonitor and ActiveStandbyElector to complete HA switching according to the events sent.

Fencing

Kill the zkfc of the master node, and zk cannot receive the ANN heartbeat. Notify the zkfc of SNN. After SNN zkfc successfully creates znode on zk, it will ask the previous ANN to call the transitionToStandby() method. If it is invalid, it will use other methods (such as kill the node), and then call transitionToActive() to become the master node.

Source code

DFSZKFailoverController is actually a java program of main method
DFSZKFailoverController is constructed in the main method and the run method is run
There are several important methods in the doRun method of the run method

private int doRun(String[] args)
    throws Exception {
    try {
        //Initialize zk
        initZK();
        //Format zk
        formatZK(force, interactive);
		//Initialize rpc
        initRPC();
        //Initialize hm
        initHM();
        //Start rpc
        startRPC();
        
        mainLoop();
    } finally {
        rpcServer.stopAndJoin();

        elector.quitElection(true);
        healthMonitor.shutdown();
        healthMonitor.join();
    }
    return 0;
}

initZK initializes zk, obtains zk connection information, such as cluster information, acl authentication, parsing, etc., and initializes ActiveStandbyElector

//Initialize ActiveStandbyElector and pass in callback methods becomeActive or becomeStandby app, etc
elector = new ActiveStandbyElector(zkQuorum,
                                   zkTimeout, getParentZnode(), zkAcls, zkAuths,
                                   new ElectorCallbacks(), maxRetryNum);


//Construct ActiveStandbyElector
public ActiveStandbyElector(String zookeeperHostPorts,
                            int zookeeperSessionTimeout, String parentZnodeName, List<ACL> acl,
                            List<ZKAuthInfo> authInfo, ActiveStandbyElectorCallback app,
                            int maxRetryNum, boolean failFast) throws IOException,
HadoopIllegalArgumentException, KeeperException {
    ...
        if (failFast) {
            createConnection();
        } else {
            reEstablishSession();
        }
}

//Create a connection to zk
private void createConnection() throws IOException, KeeperException {
    if (zkClient != null) {
        try {
            zkClient.close();
        } catch (InterruptedException e) {
            throw new IOException("Interrupted while closing ZK",
                                  e);
        }
        zkClient = null;
        watcher = null;
    }
    zkClient = connectToZooKeeper();
    if (LOG.isDebugEnabled()) {
        LOG.debug("Created new connection for " + this);
    }
}


//Connect zk and initialize the watcher to listen to the nodes on zk
protected synchronized ZooKeeper connectToZooKeeper() throws IOException,
KeeperException {
    watcher = new WatcherWithClientRef();
    ZooKeeper zk = createZooKeeper();
    watcher.setZooKeeperRef(zk);
    watcher.waitForZKConnectionEvent(zkSessionTimeout);
    ...
    }

fomartZK() formats zk and creates a directory for writing the NN status to zk later
initRPC() initializes ZKFCRpcServer
initHM() starts the health check HealthMonitor

private void initHM() {
    //1. Initialize hm and start the thread
    healthMonitor = new HealthMonitor(conf, localTarget);
    //2. Add callback function
    healthMonitor.addCallback(new HealthCallbacks());
    //3. Add callback function
    healthMonitor.addServiceStateCallback(new ServiceStateCallBacks());
    //4. Open
    healthMonitor.start();
}

//2. Callback function
class HealthCallbacks implements HealthMonitor.Callback {
    @Override
    public void enteredState(HealthMonitor.State newState) {
        //Set latest status
        setLastHealthState(newState);
        //2.1 check for election
        recheckElectability();
    }
}

//2.1 check whether to elect the HealthMonitor callback method receckelectability to check the current state of the service. In the receckelectability method, corresponding processing actions will be taken according to the last detected health state. When HealthMonitor.State is healthy, trigger the joinElection and try to create znode on zk; If the NN is in active state, the node on zk will be deleted.  
private void recheckElectability() {
    // Maintain lock ordering of elector -> ZKFC
    synchronized (elector) {
        synchronized (this) {
            boolean healthy = lastHealthState == State.SERVICE_HEALTHY;
            switch (lastHealthState) {
                case SERVICE_HEALTHY:
                    //2.1.1 election
                    elector.joinElection(targetToData(localTarget));
                    if (quitElectionOnBadState) {
                        quitElectionOnBadState = false;
                    }
                    break;
                case SERVICE_UNHEALTHY:
                    //2.1.2 withdrawal from election
                    elector.quitElection(true);
                    serviceState = HAServiceState.INITIALIZING;
                    break;
            }
        }
    }
}

//2.1.1 there is a joinElectionInternal method in the joinelection method
private void joinElectionInternal() {
    ...
        createRetryCount = 0;
    wantToBeInElection = true;
    createLockNodeAsync();
}
//createLockNodeAsync in the joinElectionInternal method calls the zk client method to create a temporary znode
private void createLockNodeAsync() {
    zkClient.create(zkLockFilePath, appData, zkAcl, CreateMode.EPHEMERAL,
                    this, zkClient);
}

//2.1.2 quitElection exits the election, and the temporary node on zk will also be deleted
public synchronized void quitElection(boolean needFence) {
    // If the current NameNode changes from Active state to Standby state, delete the temporary znode
    tryDeleteOwnBreadCrumbNode();
}

//3. Callback function
class ServiceStateCallBacks implements HealthMonitor.ServiceStateCallback {
    @Override
    public void reportServiceStatus(HAServiceStatus status) {
        // Pass in the currently detected health status for inspection
        verifyChangedServiceState(status.getState());
    }
}

//3.1 check the currently detected health status
void verifyChangedServiceState(HAServiceState changedState) {
    synchronized (elector) {
        synchronized (this) {
            if (serviceState == HAServiceState.INITIALIZING) {
                if (quitElectionOnBadState) {
                    LOG.debug("rechecking for electability from bad state");
                    recheckElectability();
                }
                return;
            }
            if (changedState == serviceState) {
                serviceStateMismatchCount = 0;
                return;
            }
            if (serviceStateMismatchCount == 0) {
                // recheck one more time. As this might be due to parallel transition.
                serviceStateMismatchCount++;
                return;
            }
            // quit the election as the expected state and reported state
            // mismatches.
            LOG.error("Local service " + localTarget
                      + " has changed the serviceState to " + changedState
                      + ". Expected was " + serviceState
                      + ". Quitting election marking fencing necessary.");
            delayJoiningUntilNanotime = System.nanoTime()
                + TimeUnit.MILLISECONDS.toNanos(1000);
            elector.quitElection(true);
            quitElectionOnBadState = true;
            serviceStateMismatchCount = 0;
            serviceState = HAServiceState.INITIALIZING;
        }
    }
}


//4. Start thread
public void run() {
    while (shouldRun) {
        try { 
            //The MonitorDaemon thread runs two methods
            //Keep trying to connect circularly until you connect to ha servce through the HAServiceProtocol agent
            loopUntilConnected();
            //4.1 monitoring and inspection
            doHealthChecks();
        } catch (InterruptedException ie) {
            Preconditions.checkState(!shouldRun,
                                     "Interrupted but still supposed to run");
        }
    }
}

//4.1 monitoring and inspection
private void doHealthChecks() throws InterruptedException {
    while (shouldRun) {
        HAServiceStatus status = null;
        boolean healthy = false;
        try {
            //Send an rpc request to see if it responds, so as to determine the health status of the NN
            status = proxy.getServiceStatus();
            proxy.monitorHealth();
            healthy = true;
        } ...
            if (healthy) {
                //The enterState method is called depending on the state
                enterState(State.SERVICE_HEALTHY);
            }

        Thread.sleep(checkIntervalMillis);
    }
}

startRPC() starts ZKFCRpcServer

rpcServer.stopAndJoin();
elector.quitElection(true);
healthMonitor.shutdown();
healthMonitor.join();

Topics: Big Data Hadoop source code