principle
summary
NameNode active / standby switchover is mainly realized by three components: ZKFailoverController, HealthMonitor and ActiveStandbyElector.
HealthMonitor is responsible for monitoring the health of NN, starting a thread to send rpc request, confirming the NN status according to the response, and notifying zkfc through callback function once the status changes
ActiveStandbyElector is mainly responsible for coordinating with ZK and listening to ZK cluster nodes for election
There are two paths on ZK cluster for Hadoop HA handover
First, the / Hadoop HA / {DFS. Nameservices} / activestandbyelectorlockactive path is used to create temporary nodes, that is, locks
Second, the / hadoop − ha/{dfs.nameservices}/ActiveBreadCrumb path is used to create a permanent node and store the address information of the ANN
Under normal circumstances, permanent nodes will be deleted when temporary nodes are deleted
When the HM of the ANN monitors that the NN state is abnormal, it notifies zkfc through the callback function, and zkfc calls the ASE method to exit the election, that is, delete the zk node; Or if the entire zkfc service is unavailable, the zk cluster does not send heartbeat to the zk cluster for a long time, and the zk cluster deletes the ha node.
Other SNNS monitor that zk cluster Active nodes are deleted through ASE. If they are in a healthy state, they will create temporary nodes under the ActiveStandbyElectorLockactive path, that is, snatch locks. When the SNN is created successfully, it will detect whether a node exists in the breadcrumb path. If so, try to delete the node by calling the transitionToStandby method. If not, it will use ssh to log in to the kill process or start a shell script to fence the ANN.
If it cannot be deleted, it will give up the lock, exit the election and wait for a period of time (in order to enable other SNN S to snatch the lock). If the deletion is successful, it will call the becomeActive method, and the bottom layer will call the transitionToActive method to turn the NN into an ANN.
technological process
1. After the initialization of HealthMonitor is completed, the internal thread will be started to regularly call the methods of the HAServiceProtocol RPC interface of the corresponding NameNode to detect the health status of the NameNode.
2. If the HealthMonitor detects that the health status of the NameNode has changed, it will call back the corresponding method registered by ZKFailoverController for processing.
3. If ZKFailoverController judges that active / standby switching is required, it will first use ActiveStandbyElector to conduct automatic active / standby election.
4.ActiveStandbyElector interacts with Zookeeper to complete automatic active and standby election.
5. After the active standby election, activestandbyelector will call back the corresponding method of ZKFailoverController to notify the current NameNode to become the primary NameNode or standby NameNode
6.ZKFailoverController calls the method of the HAServiceProtocol RPC interface of the corresponding NameNode to convert the NameNode to Active state or Standby state.
HealthMonitor
HealthMonitor is used to monitor the health status (HealthMonitor.State) and service status (HAServiceStatus) of the local NN through RPC. When the status information changes, it sends information to ZKFC through callback.
//Five states of HealthMonitor /** * The health monitor is still starting up. */ INITIALIZING, /** * The service is not responding to health check RPCs. */ SERVICE_NOT_RESPONDING, /** * The service is connected and healthy. */ SERVICE_HEALTHY, /** * The service is running but unhealthy. */ SERVICE_UNHEALTHY, /** * The health monitor itself failed unrecoverably and can * no longer provide accurate information. */ HEALTH_MONITOR_FAILED; //HAServiceStatus has four statuses INITIALIZING("initializing"), ACTIVE("active"), STANDBY("standby"), STOPPING("stopping");
ActiveStandbyElector
ActiveStandbyElector mainly controls and monitors the status of nodes on ZK and interacts with ZKFC. How to call joinElection, ASE will try to create a node on ZK (obtain a lock). If the node is successfully created, call becomeActive to become an ANN. If it fails, call becameStandby to become an SNN to continue monitoring the health status of NN and register a watcher to monitor the active lock.
/** * To participate in election, the app will call joinElection. The result will * be notified by a callback on either the becomeActive or becomeStandby app * interfaces. */ public synchronized void joinElection(byte[] data) /** * Any service instance can drop out of the election by calling quitElection. * <br/> */ public synchronized void quitElection(boolean needFence)
ZKFC
ZKFC initializes HealthMonitor and ActiveStandbyElector when it is created. ZKFC coordinates HealthMonitor and ActiveStandbyElector to complete HA switching according to the events sent.
Fencing
Kill the zkfc of the master node, and zk cannot receive the ANN heartbeat. Notify the zkfc of SNN. After SNN zkfc successfully creates znode on zk, it will ask the previous ANN to call the transitionToStandby() method. If it is invalid, it will use other methods (such as kill the node), and then call transitionToActive() to become the master node.
Source code
DFSZKFailoverController is actually a java program of main method
DFSZKFailoverController is constructed in the main method and the run method is run
There are several important methods in the doRun method of the run method
private int doRun(String[] args) throws Exception { try { //Initialize zk initZK(); //Format zk formatZK(force, interactive); //Initialize rpc initRPC(); //Initialize hm initHM(); //Start rpc startRPC(); mainLoop(); } finally { rpcServer.stopAndJoin(); elector.quitElection(true); healthMonitor.shutdown(); healthMonitor.join(); } return 0; }
initZK initializes zk, obtains zk connection information, such as cluster information, acl authentication, parsing, etc., and initializes ActiveStandbyElector
//Initialize ActiveStandbyElector and pass in callback methods becomeActive or becomeStandby app, etc elector = new ActiveStandbyElector(zkQuorum, zkTimeout, getParentZnode(), zkAcls, zkAuths, new ElectorCallbacks(), maxRetryNum); //Construct ActiveStandbyElector public ActiveStandbyElector(String zookeeperHostPorts, int zookeeperSessionTimeout, String parentZnodeName, List<ACL> acl, List<ZKAuthInfo> authInfo, ActiveStandbyElectorCallback app, int maxRetryNum, boolean failFast) throws IOException, HadoopIllegalArgumentException, KeeperException { ... if (failFast) { createConnection(); } else { reEstablishSession(); } } //Create a connection to zk private void createConnection() throws IOException, KeeperException { if (zkClient != null) { try { zkClient.close(); } catch (InterruptedException e) { throw new IOException("Interrupted while closing ZK", e); } zkClient = null; watcher = null; } zkClient = connectToZooKeeper(); if (LOG.isDebugEnabled()) { LOG.debug("Created new connection for " + this); } } //Connect zk and initialize the watcher to listen to the nodes on zk protected synchronized ZooKeeper connectToZooKeeper() throws IOException, KeeperException { watcher = new WatcherWithClientRef(); ZooKeeper zk = createZooKeeper(); watcher.setZooKeeperRef(zk); watcher.waitForZKConnectionEvent(zkSessionTimeout); ... }
fomartZK() formats zk and creates a directory for writing the NN status to zk later
initRPC() initializes ZKFCRpcServer
initHM() starts the health check HealthMonitor
private void initHM() { //1. Initialize hm and start the thread healthMonitor = new HealthMonitor(conf, localTarget); //2. Add callback function healthMonitor.addCallback(new HealthCallbacks()); //3. Add callback function healthMonitor.addServiceStateCallback(new ServiceStateCallBacks()); //4. Open healthMonitor.start(); } //2. Callback function class HealthCallbacks implements HealthMonitor.Callback { @Override public void enteredState(HealthMonitor.State newState) { //Set latest status setLastHealthState(newState); //2.1 check for election recheckElectability(); } } //2.1 check whether to elect the HealthMonitor callback method receckelectability to check the current state of the service. In the receckelectability method, corresponding processing actions will be taken according to the last detected health state. When HealthMonitor.State is healthy, trigger the joinElection and try to create znode on zk; If the NN is in active state, the node on zk will be deleted. private void recheckElectability() { // Maintain lock ordering of elector -> ZKFC synchronized (elector) { synchronized (this) { boolean healthy = lastHealthState == State.SERVICE_HEALTHY; switch (lastHealthState) { case SERVICE_HEALTHY: //2.1.1 election elector.joinElection(targetToData(localTarget)); if (quitElectionOnBadState) { quitElectionOnBadState = false; } break; case SERVICE_UNHEALTHY: //2.1.2 withdrawal from election elector.quitElection(true); serviceState = HAServiceState.INITIALIZING; break; } } } } //2.1.1 there is a joinElectionInternal method in the joinelection method private void joinElectionInternal() { ... createRetryCount = 0; wantToBeInElection = true; createLockNodeAsync(); } //createLockNodeAsync in the joinElectionInternal method calls the zk client method to create a temporary znode private void createLockNodeAsync() { zkClient.create(zkLockFilePath, appData, zkAcl, CreateMode.EPHEMERAL, this, zkClient); } //2.1.2 quitElection exits the election, and the temporary node on zk will also be deleted public synchronized void quitElection(boolean needFence) { // If the current NameNode changes from Active state to Standby state, delete the temporary znode tryDeleteOwnBreadCrumbNode(); } //3. Callback function class ServiceStateCallBacks implements HealthMonitor.ServiceStateCallback { @Override public void reportServiceStatus(HAServiceStatus status) { // Pass in the currently detected health status for inspection verifyChangedServiceState(status.getState()); } } //3.1 check the currently detected health status void verifyChangedServiceState(HAServiceState changedState) { synchronized (elector) { synchronized (this) { if (serviceState == HAServiceState.INITIALIZING) { if (quitElectionOnBadState) { LOG.debug("rechecking for electability from bad state"); recheckElectability(); } return; } if (changedState == serviceState) { serviceStateMismatchCount = 0; return; } if (serviceStateMismatchCount == 0) { // recheck one more time. As this might be due to parallel transition. serviceStateMismatchCount++; return; } // quit the election as the expected state and reported state // mismatches. LOG.error("Local service " + localTarget + " has changed the serviceState to " + changedState + ". Expected was " + serviceState + ". Quitting election marking fencing necessary."); delayJoiningUntilNanotime = System.nanoTime() + TimeUnit.MILLISECONDS.toNanos(1000); elector.quitElection(true); quitElectionOnBadState = true; serviceStateMismatchCount = 0; serviceState = HAServiceState.INITIALIZING; } } } //4. Start thread public void run() { while (shouldRun) { try { //The MonitorDaemon thread runs two methods //Keep trying to connect circularly until you connect to ha servce through the HAServiceProtocol agent loopUntilConnected(); //4.1 monitoring and inspection doHealthChecks(); } catch (InterruptedException ie) { Preconditions.checkState(!shouldRun, "Interrupted but still supposed to run"); } } } //4.1 monitoring and inspection private void doHealthChecks() throws InterruptedException { while (shouldRun) { HAServiceStatus status = null; boolean healthy = false; try { //Send an rpc request to see if it responds, so as to determine the health status of the NN status = proxy.getServiceStatus(); proxy.monitorHealth(); healthy = true; } ... if (healthy) { //The enterState method is called depending on the state enterState(State.SERVICE_HEALTHY); } Thread.sleep(checkIntervalMillis); } }
startRPC() starts ZKFCRpcServer
rpcServer.stopAndJoin(); elector.quitElection(true); healthMonitor.shutdown(); healthMonitor.join();