Node.js-Ali Egg's Multiprocess Model and Interprocess Communication

Posted by greenday on Sun, 12 May 2019 12:42:14 +0200

Preface

Recently, Egg has been used as a low-level framework development project, curious about the management implementation of its multi-process model, so I learned something and recorded it incidentally. If there are any mistakes in the article, please spray it lightly.

Why multi-process is needed

With the development of science and technology, the servers nowadays are basically multi-core CPUs. However, Node is a single-process, single-threaded language (for developers, single-threaded, not actually). As we all know, the scheduling unit of the cpu is threads, and based on the characteristics of Node, we can only use one cpu at a time. Not only is the utilization rate extremely low, but fault tolerance is also unacceptable (when an error occurs, the whole program crashes). So Node has cluster To help us make full use of server resources.

Working Principle of cluster
I recommend you to see how cluster ing works. This article Here's a brief summary:

  1. The port monitoring of sub-processes will be hack ed out, but will be monitored by master's internal TCP, so there will not be multiple sub-processes monitoring the same port and reporting errors.
  2. In the request processing logic of TCP, a worker process is selected to send a newconn internal message to the master and a client handle is sent with the message. (There are two ways to select, the first is the default method of looping for all platforms except Windows, that is, the main process is responsible for monitoring ports, receiving new connections and then distributing the connection loops to the working process. Some built-in techniques are used in distribution to prevent task overload in work processes. The second is that the main process creates a listening socket and sends it to the interested work process, which is responsible for receiving the connection directly.)
  3. When the worker process receives the handle, it creates a client instance (net.socket) to execute the specific business logic, and then returns.

As shown in the picture:

Graph reference Source

Multi-process model

Have a look first Egg official documents Process model

                +--------+          +-------+
                | Master |<-------->| Agent |
                +--------+          +-------+
                ^   ^    ^
               /    |     \
             /      |       \
           /        |         \
         v          v          v
+----------+   +----------+   +----------+
| Worker 1 |   | Worker 2 |   | Worker 3 |
+----------+   +----------+   +----------+
type Process quantity Effect stability Whether to run business code
Master 1 Process management, inter-process message forwarding Very high no
Agent 1 Background Running Work (Long Connection Client) high A few
Worker Generally cpu Number Execute business code commonly yes

Roughly speaking, Master is used as the main thread, Agent is started as a secretarial process to assist Worker in handling some public affairs (such as logs), and Worker process is started to execute real business code.

Implementation of multi-process

Process-related code

Start with Master, which is considered to be the top-level process for the time being (there's actually a parent process, I'll talk about it later).

/**
 * start egg app
 * @method Egg#startCluster
 * @param {Object} options {@link Master}
 * @param {Function} callback start success callback
 */
exports.startCluster = function(options, callback) {
  new Master(options).ready(callback);
};

Starting with Master's constructor

constructor(options) {
  super();
  // Initialization parameters
  this.options = parseOptions(options);
  // See Manager and Messenger for details of management classes for worker processes
  this.workerManager = new Manager();
  // Messageenger class, see Manager and Messenger for details
  this.messenger = new Messenger(this);
  // For details of setting a read event, see the get-ready npm package
  ready.mixin(this);
  // Is it a production environment?
  this.isProduction = isProduction();
  this.agentWorkerIndex = 0;
  // Is it closed?
  this.closed = false;
  ...

  //Next we look at the callback function of read and the various events registered:
  this.ready(() => {
    // Set the start state to true
    this.isStarted = true;
    const stickyMsg = this.options.sticky ? ' with STICKY MODE!' : '';
    this.logger.info('[master] %s started on %s (%sms)%s',
    frameworkPkg.name, this[APP_ADDRESS], Date.now() - startTime, stickyMsg);

    // Send egg-read to each process and trigger related events
    const action = 'egg-ready';
    this.messenger.send({ action, to: 'parent', data: { port: this[REALPORT], address: this[APP_ADDRESS] } });
    this.messenger.send({ action, to: 'app', data: this.options });
    this.messenger.send({ action, to: 'agent', data: this.options });
    // start check agent and worker status
    this.workerManager.startCheck();
    });
    // Register all kinds of events
    this.on('agent-exit', this.onAgentExit.bind(this));
    this.on('agent-start', this.onAgentStart.bind(this));
    ...
    // Check the port and Fork an Agent
    detectPort((err, port) => {
      ... 
      this.forkAgentWorker();
    }
  });
}

In summary, we can see that Master's constructor is mainly to initialize and register all kinds of corresponding events, and finally run the forkAgentWorker function. The key code of this function can be seen:

const agentWorkerFile = path.join(__dirname, 'agent_worker.js');
// Executing an Agent through child_process
const agentWorker = childprocess.fork(agentWorkerFile, args, opt);

Continuing to agent_worker.js, agent_worker instantiates an agent object and agent_worker.js has a key code:

agent.ready(() => {
  agent.removeListener('error', startErrorHandler); // Clear up bug-listening events
  process.send({ action: 'agent-start', to: 'master' }); // Send an agent-start action to master
});

As you can see, the code in agent_worker.js sends a message to the master, acting as agent-start, and then returning to Master, you can see that it registers two events, forkAppWorkers of once and onAgentStart of onAgent.

this.on('agent-start', this.onAgentStart.bind(this));
this.once('agent-start', this.forkAppWorkers.bind(this));

First look at the onAgentStart function, which is relatively simple, is the transfer of some information:

onAgentStart() {
    this.agentWorker.status = 'started';

    // Send egg-ready when agent is started after launched
    if (this.isAllAppWorkerStarted) {
      this.messenger.send({ action: 'egg-ready', to: 'agent', data: this.options });
    }

    this.messenger.send({ action: 'egg-pids', to: 'app', data: [ this.agentWorker.pid ] });
    // should send current worker pids when agent restart
    if (this.isStarted) {
      this.messenger.send({ action: 'egg-pids', to: 'agent', data: this.workerManager.getListeningWorkerIds() });
    }

    this.messenger.send({ action: 'agent-start', to: 'app' });
    this.logger.info('[master] agent_worker#%s:%s started (%sms)',
      this.agentWorker.id, this.agentWorker.pid, Date.now() - this.agentStartTime);
  }

The forkAppWorkers function is then executed, mainly with the help of cfork Packet fork corresponds to the work process and registers a series of related monitoring events.

...
cfork({
  exec: this.getAppWorkerFile(),
  args,
  silent: false,
  count: this.options.workers,
  // don't refork in local env
  refork: this.isProduction,
});
...
// Triggering app-start events
cluster.on('listening', (worker, address) => {
  this.messenger.send({
    action: 'app-start',
    data: { workerPid: worker.process.pid, address },
    to: 'master',
    from: 'app',
  });
});

You can see that the forkAppWorkers function triggers app-start events on the master when Listening for Listening events.

this.on('app-start', this.onAppStart.bind(this));

...
// Master read callback trigger
if (this.options.sticky) {
  this.startMasterSocketServer(err => {
    if (err) return this.ready(err);
      this.ready(true);
  });
} else {
  this.ready(true);
}

// ready callback sends egg-read status to each process
const action = 'egg-ready';
this.messenger.send({ action, to: 'parent', data: { port: this[REALPORT], address: this[APP_ADDRESS] } });
this.messenger.send({ action, to: 'app', data: this.options });
this.messenger.send({ action, to: 'agent', data: this.options });

// start check agent and worker status
if (this.isProduction) {
  this.workerManager.startCheck();
}

Conclusion:

  1. Master.constructor: Execute Master's constructor first, where a detect or function is executed
  2. Detect: Detect => forkAgentWorker()
  3. forkAgentWorker: Get the Agent process and trigger the agent-start event to the master
  4. Execute the onAgentStart function and the forkAppWorker function (once)
  5. OnAgentStart => Send various kinds of information, forkAppWorker => Trigger app-start event to master
  6. App-start event triggers onAppStart() method
  7. OnAppStart => Set ready (true) => Execute read callback function
  8. Ready () = > Send egg-read to each process and trigger related events to execute the startCheck() function
+---------+           +---------+          +---------+
|  Master |           |  Agent  |          |  Worker |
+---------+           +----+----+          +----+----+
     |      fork agent     |                    |
     +-------------------->|                    |
     |      agent ready    |                    |
     |<--------------------+                    |
     |                     |     fork worker    |
     +----------------------------------------->|
     |     worker ready    |                    |
     |<-----------------------------------------+
     |      Egg ready      |                    |
     +-------------------->|                    |
     |      Egg ready      |                    |
     +----------------------------------------->|

Process guard

According to official documents, process guardianship is largely dependent on graceful and egg-cluster These two libraries.

Uncaptured exception

  1. Close all TCP Server s of the exception Worker process (quickly disconnect existing connections and no longer receive new connections), disconnect the IPC channel of Master, and no longer accept new user requests.
  2. Master immediately fork s a new Worker process to keep the total number of workers online unchanged.
  3. The exception Worker waits for a period of time and exits after processing the accepted request.
+---------+                 +---------+
|  Worker |                 |  Master |
+---------+                 +----+----+
     | uncaughtException         |
     +------------+              |
     |            |              |                   +---------+
     | <----------+              |                   |  Worker |
     |                           |                   +----+----+
     |        disconnect         |   fork a new worker    |
     +-------------------------> + ---------------------> |
     |         wait...           |                        |
     |          exit             |                        |
     +-------------------------> |                        |
     |                           |                        |
    die                          |                        |
                                 |                        |
                                 |                        |

As you can see from the executed app file, apps actually inherit from Application Class, under which graceful() is called.

onServer(server) {
    ......
    graceful({
      server: [ server ],
      error: (err, throwErrorCount) => {
        ......
      },
    });
    ......
  }

Looking at graceful, you can see that it captures the process.on('uncaughtException') event, closes the TCP connection in the callback function, closes its own process, and disconnects the IPC channel with the master.

process.on('uncaughtException', function (err) {
    ......
    // Setting Connection: close response header for http connections
    servers.forEach(function (server) {
      if (server instanceof http.Server) {
        server.on('request', function (req, res) {
          // Let http server set `Connection: close` header, and close the current request socket.
          req.shouldKeepAlive = false;
          res.shouldKeepAlive = false;
          if (!res._header) {
            res.setHeader('Connection', 'close');
          }
        });
      }
    });

    // Set a timer function to close the child process and exit the process itself
    // make sure we close down within `killTimeout` seconds
    var killtimer = setTimeout(function () {
      console.error('[%s] [graceful:worker:%s] kill timeout, exit now.', Date(), process.pid);
      if (process.env.NODE_ENV !== 'test') {
        // kill children by SIGKILL before exit
        killChildren(function() {
          // Withdrawal from its own process
          process.exit(1);
        });
      }
    }, killTimeout);

    // But don't keep the process open just for that!
    // If there is no more io waitting, just let process exit normally.
    if (typeof killtimer.unref === 'function') {
      // only worked on node 0.10+
      killtimer.unref();
    }

    var worker = options.worker || cluster.worker;

    // cluster mode
    if (worker) {
      try {
        // Close TCP Connection
        for (var i = 0; i < servers.length; i++) {
          var server = servers[i];
          server.close();
        }
      } catch (er1) {
        ......
      }

      try {
        // Close ICP channel
        worker.disconnect();
      } catch (er2) {
        ......
      }
    }
  });

ok, after closing the IPC channel, we continue to look at the cfork file, the fork worker package mentioned above, which monitors the disconnect event of the sub-process, and he will judge whether to re-fork a new sub-process according to the conditions.

cluster.on('disconnect', function (worker) {
    ......
    // Save the pid
    disconnects[worker.process.pid] = utility.logDate();
    if (allow()) {
      // fork A New Subprocess
      newWorker = forkWorker(worker._clusterSettings);
      newWorker._clusterSettings = worker._clusterSettings;
    } else {
      ......
    }
  });

Generally speaking, this time will continue to wait for a while and then execute the timer function mentioned above, that is, exit the process.

OOM,System exception
About thisSystem exception, Sometimes in sub-processesCan't catchYes, we can only master Processing, that iscforkBag.

cluster.on('exit', function (worker, code, signal) {
    // If it's a program exception, it fork s a subprocess again through the uncatughException mentioned above, so it's not needed here.
    var isExpected = !!disconnects[worker.process.pid];
    if (isExpected) {
      delete disconnects[worker.process.pid];
      // worker disconnect first, exit expected
      return;
    }
    // It's the master kills the child process, no fork is needed
    if (worker.disableRefork) {
      // worker is killed by master
      return;
    }

    if (allow()) {
      newWorker = forkWorker(worker._clusterSettings);
      newWorker._clusterSettings = worker._clusterSettings;
    } else {
      ......
    }
    cluster.emit('unexpectedExit', worker, code, signal);
  });

Interprocess Communication (IPC)

As mentioned above, the IPC channel of cluster exists only between Master and Worker/Agent, and there is no interaction between Worker and Agent processes. So what should workers do to communicate with each other? Yes, through Master.

Broadcast message: agent => all workers
                  +--------+          +-------+
                  | Master |<---------| Agent |
                  +--------+          +-------+
                 /    |     \
                /     |      \
               /      |       \
              /       |        \
             v        v         v
  +----------+   +----------+   +----------+
  | Worker 1 |   | Worker 2 |   | Worker 3 |
  +----------+   +----------+   +----------+

//Designated recipient: one worker => another worker
                  +--------+          +-------+
                  | Master |----------| Agent |
                  +--------+          +-------+
                 ^    |
     send to    /     |
    worker 2   /      |
              /       |
             /        v
  +----------+   +----------+   +----------+
  | Worker 1 |   | Worker 2 |   | Worker 3 |
  +----------+   +----------+   +----------+

In master, you can see that when agent s and app s are fork ed, they listen to their information and transform it into an object:

agentWorker.on('message', msg => {
  if (typeof msg === 'string') msg = { action: msg, data: msg };
  msg.from = 'agent';
  this.messenger.send(msg);
});

worker.on('message', msg => {
  if (typeof msg === 'string') msg = { action: msg, data: msg };
  msg.from = 'app';
  this.messenger.send(msg);
});

You can see that the last call is messenger.send, and messengeer.send determines where to send information based on from and to.

send(data) {
    if (!data.from) {
      data.from = 'master';
    }
    ......

    // app -> master
    // agent -> master
    if (data.to === 'master') {
      debug('%s -> master, data: %j', data.from, data);
      // app/agent to master
      this.sendToMaster(data);
      return;
    }

    // master -> parent
    // app -> parent
    // agent -> parent
    if (data.to === 'parent') {
      debug('%s -> parent, data: %j', data.from, data);
      this.sendToParent(data);
      return;
    }

    // parent -> master -> app
    // agent -> master -> app
    if (data.to === 'app') {
      debug('%s -> %s, data: %j', data.from, data.to, data);
      this.sendToAppWorker(data);
      return;
    }

    // parent -> master -> agent
    // App - > Master - > agent, may not specify to
    if (data.to === 'agent') {
      debug('%s -> %s, data: %j', data.from, data.to, data);
      this.sendToAgentWorker(data);
      return;
    }
  }

master is a registration event that emit s directly from action information.

sendToMaster(data) {
  this.master.emit(data.action, data.data);
}

agent and worker, on the other hand, use a sendmessage package, which actually calls the following similar methods

 // Passing information to subprocesses
 agent.send(data)
 worker.send(data)

Finally, the Messenger class is invoked on the base class EggApplication inherited by both agent and app. The constructors inside the class are as follows:

constructor() {
    super();
    ......
    this._onMessage = this._onMessage.bind(this);
    process.on('message', this._onMessage);
  }

_onMessage(message) {
    if (message && is.string(message.action)) {
      // Register events corresponding to action information emit, just like master  
      this.emit(message.action, message.data);
    }
  }

To sum up:
The idea is to use event mechanism and IPC channel to achieve communication between processes.

Other

There is a function of timeout.unref() in the process of learning. I recommend you to refer to this function. Answer to this question on the 6th floor

summary

From front-end thinking to back-end thinking is actually very laborious, plus Egg process management implementation is really very powerful, so spent a lot of time on various APIs and thinking.

Reference and Reference

Multiprocess Model and Interprocess Communication
Egg source code parsing egg-cluster

Topics: node.js socket Windows npm