JStorm Source Reading Series--01--Nimbus Startup Analysis

Posted by davidmuir on Thu, 11 Jul 2019 03:02:21 +0200

Written earlier, the author first read the framework source code, so there may be some misunderstanding errors or no detailed explanation. If you find errors in the reading process, you are welcome to comment and point out in the article below. The article will be updated in succession. You can pay attention to it or collect it. Please send it to me by private mail first. Thank you.

Summary

JStorm is a distributed real-time computing engine. It is a framework rewritten by Alibaba according to storm's flow processing model. It supports the common logical model (i.e. the topology), and then the underlying implementation is quite different. However, this article is not intended to compare the two frameworks. Next, I will analyze how JStorm works from the source point of view.
As the first chapter, the author first introduces nimbus and what it did when it started. The main node of JStorm runs a nimbus daemon, which is responsible for communicating with ZK, distributing code, assigning tasks to slave nodes in the cluster, monitoring the status of the cluster, and so on. In addition, all the states that nimbus needs to maintain are stored in ZK, and JStorm has made some caches to reduce the number of visits to ZK, as the subsequent code analysis will say. These are all online profiles. Next, let's look at what Nimbus has done from the source point of view.

//Set the default program to call when the main thread suddenly aborts due to uncovered exceptions
Thread.setDefaultUncaughtExceptionHandler(new DefaultUncaughtExceptionHandler());
//Loading configuration information for clusters
Map config = Utils.readStormConfig();
//The following method has been commented out internally, and the author has not paid much attention to it for the time being, so I will add it later.
JStormServerUtils.startTaobaoJvmMonitor();
//Create a Nimbus Server instance
NimbusServer instance = new NimbusServer();
//Create a default nimbus startup class
INimbus iNimbus = new DefaultInimbus();
//Begin the actual initialization
instance.launchServer(config, iNimbus);

In fact, there are not too many processing operations in Default Uncaught Exception Handler. Simply determine whether the memory overflow occurs, then shut down normally, otherwise the exception is thrown directly and then interrupted. The process of reading configuration is not explained in detail. The class Nimbus Server encapsulates some member variables and methods for manipulating Nimbus, which will be discussed later.
The most important method is launchServer. Next, I will explain the function of this method in detail. First, I will look at the internal code of launchServer.

private void launchServer(final Map conf, INimbus inimbus) {
    LOG.info("Begin to start nimbus with conf " + conf);
    try {
        //Determine whether the configuration mode is correct
        StormConfig.validate_distributed_mode(conf);
        createPid(conf);
        //Set up the operation at exit
        initShutdownHook();
        //This method has no operation in the default implementation
        inimbus.prepare(conf, StormConfig.masterInimbus(conf));
        //Create NimbusData objects
        data = createNimbusData(conf, inimbus);
        //This method is mainly responsible for handling operations after nimbus threads are called leader threads.
        initFollowerThread(conf);
        int port = ConfigExtension.getNimbusDeamonHttpserverPort(conf);
        hs = new Httpserver(port, conf);
        hs.start();
        //If the cluster is running on yarn, some initialization operations are also needed.
        initContainerHBThread(conf);
        serviceHandler = new ServiceHandler(data);
        //thrift is a distributed RPC framework
        initThrift(conf);
    } catch (Throwable e) {
        if (e instanceof OutOfMemoryError) {
           LOG.error("Halting due to Out Of Memory Error...");
        }
        LOG.error("Fail to run nimbus ", e);
    } finally {
        cleanup();
    }

    LOG.info("Quit nimbus");
}

Judging patterns in configuration

_is only to determine whether a field named "storm.cluster.mode" in the configuration information is "distributed" or "local" in the local mode.

initShutdownHook

_Adding some operations when quitting, including setting parameters to remind the cluster to quit, clearing some worker threads stored in nimbus (a series of daemon threads dealing with communication, code distribution, heartbeat), closing various open resources, etc.

createNimbusData

This method is used to create an object of NimbusData, which encapsulates some member variables that Nimbus communicates with ZK. Some member variables of NimbusData and their roles are discussed gradually within each method. First, let's look at how NimbusData is constructed.

public NimbusData(final Map conf, INimbus inimbus) throws Exception {
    this.conf = conf;

    createFileHandler();
    mkBlobCacheMap();
    this.nimbusHostPortInfo = NimbusInfo.fromConf(conf);
    this.blobStore = BlobStoreUtils.getNimbusBlobStore(conf, nimbusHostPortInfo);

    this.isLaunchedCleaner = false;
    this.isLaunchedMonitor = false;

    this.submittedCount = new AtomicInteger(0);

    this.stormClusterState = Cluster.mk_storm_cluster_state(conf);

    createCache();

    this.taskHeartbeatsCache = new ConcurrentHashMap<String, Map<Integer, TkHbCacheTime>>();

    this.scheduExec = Executors.newScheduledThreadPool(SCHEDULE_THREAD_NUM);

    this.statusTransition = new StatusTransition(this);

    this.startTime = TimeUtils.current_time_secs();

    this.inimubs = inimbus;

    localMode = StormConfig.local_mode(conf);

    this.metricCache = new JStormMetricCache(conf, this.stormClusterState);
    this.clusterName = ConfigExtension.getClusterName(conf);

    pendingSubmitTopologies = new TimeCacheMap<String, Object>(JStormUtils.MIN_10);
    topologyTaskTimeout = new ConcurrentHashMap<String, Integer>();
    tasksHeartbeat = new ConcurrentHashMap<String, TopologyTaskHbInfo>();

    this.metricsReporter = new JStormMetricsReporter(this);
    this.metricRunnable = ClusterMetricsRunnable.mkInstance(this);
    String configUpdateHandlerClass = ConfigExtension.getNimbusConfigUpdateHandlerClass(conf);
    this.configUpdateHandler = (ConfigUpdateHandler) Utils.newInstance(configUpdateHandlerClass);
        

    if (conf.containsKey(Config.NIMBUS_TOPOLOGY_ACTION_NOTIFIER_PLUGIN)) {
        String string = (String) conf.get(Config.NIMBUS_TOPOLOGY_ACTION_NOTIFIER_PLUGIN);
        nimbusNotify = (ITopologyActionNotifierPlugin) Utils.newInstance(string);
            
    } else {
        nimbusNotify = null;
    }

}

_3.1 createFileHandler: Within this method, an anonymous inner class ExpiredCallback is implemented, in which a method called expire is implemented to close Channel or BufferFileInputStream instance objects by callback.

public void createFileHandler() {
    ExpiredCallback<Object, Object> expiredCallback = new ExpiredCallback<Object, Object>() {
        @Override
        public void expire(Object key, Object val) {
            try {
                LOG.info("Close file " + String.valueOf(key));
                if (val != null) {
                    if (val instanceof Channel) {
                        Channel channel = (Channel) val;
                        channel.close();
                    } else if (val instanceof BufferFileInputStream) {
                        BufferFileInputStream is = (BufferFileInputStream) val;
                        is.close();
                    }
                }
            } catch (IOException e) {
                LOG.error(e.getMessage(), e);
            }

        }
    };
    //timeout
    int file_copy_expiration_secs = JStormUtils.parseInt(conf.get(Config.NIMBUS_FILE_COPY_EXPIRATION_SECS), 30);
    uploaders = new TimeCacheMap<Object, Object>(file_copy_expiration_secs, expiredCallback);
    downloaders = new TimeCacheMap<Object, Object>(file_copy_expiration_secs, expiredCallback);
}

_Then initialize the two member variables uploaders and downloaders of NimbusData, which maintain the channels that need to be uploaded and downloaders that need to be downloaded, respectively. The main implementation logic of the TimeCacheMap class is to start a daemon thread inside its constructor. Firstly, a buffer is created. As long as the system is not shut down, the object is fetched continuously in the buffer inside the daemon thread. When the object is not empty, the expire method of the callback function is called and the corresponding operation is performed. The expire method passed in here is to close Channel or BufferFileInputStream.
_3.2. mkBlobCacheMap: Similar to the above method, it declares an anonymous inner class and initializes several member variables. The code is almost the same as the previous method without wasting noodles to paste. Here, the expire method closes two streams, AtomicOutputStream and BufferInputStream. blobUploaders and blobDownloaders store the open streams for upload and download, respectively. blobListers store uploaded and downloaded data.
Initialization of several member variables, including NimbusInfo (including host name, port and flag is leader), BlobStore (used to store blob data, using key value storage), Ali provides two different ways of blob storage, one is local file system storage, the other is hdfs storage, two sides The difference is that because local file storage does not guarantee consistency, ZK intervention is required to ensure that this is the default configuration of JStorm. If you use hdfs to store, you do not need ZK intervention, because hdfs can ensure consistency and correctness, StormCluster State (storage of the state of the entire cluster, which is obtained from ZK), in order to avoid multiple communications to ZK, you also need to set up cache information, task heartbeat information and so on.
_3.4. Initialize metrics-related reporting threads and listening threads.

initFollowerThread

_4.1. The method first initializes a callback function, which is a method used to initialize a series of variables when a nimbus becomes leader, including how to allocate the topology on the cluster, update the topology state, clear the function, and monitor threads. There will be a new chapter to introduce this init method. Here is the source code of this method.

private void init(Map conf) throws Exception {
    data.init();
    
    NimbusUtils.cleanupCorruptTopologies(data);
    //Topological Distribution
    initTopologyAssign();
    //Status Update
    initTopologyStatus();
    //Clearance function
    initCleaner(conf);
    
    initMetricRunnable();
    
    if (!data.isLocalMode()) {
        initMonitor(conf);
        //mkRefreshConfThread(data);
    }
}

Initialize a subclass of Runnable. In the construction method, first determine that the cluster does not use local mode, and then update the node information on ZK (register nimbus on ZK). Then get the status information of the cluster through ZK. After all, nimbus needs to maintain the whole cluster. Next, to determine whether there is a leader or not, after two unsuccessful elections for leader, the nimbus information on ZK is deleted and withdrawn. If the blobstore uses local file mode (both in this article mode and in hdfs mode), a callback function needs to be added to synchronize the blob when the nimbus is not a leader. In addition, those active blobs need to be stored in ZK, and the dead ones need to be removed.
_4.3. Set the thread as a daemon thread and start the thread. The run method first determines whether there is a leader in the cluster currently stored on ZK, and if not, elects the current nimbus as the leader thread. If there are leader threads, you need to determine whether they are the same as the current nimbus, and if they are not, stop the current nimbus. After all, leaders already exist. If it is the same, it is necessary to determine the local state, if not set to leader, indicating that the current nimbus has not been initialized, then set nimbus to leader and then call back function to initialize, that is, call init(conf) method.
Get a port (default port is 7621) to build the HttpServer instance object. It can be used to process and accept tcp connections and start a new thread for httpserver monitoring. (The main role or where to use it is not yet clear, and it will be revised later).

initContainerHBThread

_The main function of this method is to know whether a jstorm cluster can be run on a resource manager (yarn), and if so, to create a new thread for processing. In fact, the purpose of using containers here is to run multiple different logical clusters or even multiple JStorm clusters on a physical cluster, which can dynamically adjust the resources allocated by the logical cluster. In addition, resource managers can provide very strong scalability. Container threads will be added to Nimbus Server, which will be explained in detail later. This container thread is also a daemon thread and will start immediately. The run method of this thread contains two processes:
HandleWriteDir: The main function of this method is to clear out the expired heartbeat information on the container. To be precise, if the heartbeat information under the JStorm cluster container directory is greater than 10, it needs to be cleared (from the oldest start).
HandlReadDir: Maintaining local acceptance of hb information on the cluster and throwing exceptions if multiple timeouts occur.

initThrift

Thift is a distributed RPC framework used by JStorm. The author subsequently added the corresponding source code parsing.

Topics: Java