Preface
Recently, Egg has been used as a low-level framework development project, curious about the management implementation of its multi-process model, so I learned something and recorded it incidentally. If there are any mistakes in the article, please spray it lightly.
Why multi-process is needed
With the development of science and technology, the servers nowadays are basically multi-core CPUs. However, Node is a single-process, single-threaded language (for developers, single-threaded, not actually). As we all know, the scheduling unit of the cpu is threads, and based on the characteristics of Node, we can only use one cpu at a time. Not only is the utilization rate extremely low, but fault tolerance is also unacceptable (when an error occurs, the whole program crashes). So Node has cluster To help us make full use of server resources.
Working Principle of cluster
I recommend you to see how cluster ing works. This article Here's a brief summary:
- The port monitoring of sub-processes will be hack ed out, but will be monitored by master's internal TCP, so there will not be multiple sub-processes monitoring the same port and reporting errors.
- In the request processing logic of TCP, a worker process is selected to send a newconn internal message to the master and a client handle is sent with the message. (There are two ways to select, the first is the default method of looping for all platforms except Windows, that is, the main process is responsible for monitoring ports, receiving new connections and then distributing the connection loops to the working process. Some built-in techniques are used in distribution to prevent task overload in work processes. The second is that the main process creates a listening socket and sends it to the interested work process, which is responsible for receiving the connection directly.)
- When the worker process receives the handle, it creates a client instance (net.socket) to execute the specific business logic, and then returns.
As shown in the picture:
Graph reference Source
Multi-process model
Have a look first Egg official documents Process model
+--------+ +-------+ | Master |<-------->| Agent | +--------+ +-------+ ^ ^ ^ / | \ / | \ / | \ v v v +----------+ +----------+ +----------+ | Worker 1 | | Worker 2 | | Worker 3 | +----------+ +----------+ +----------+
type | Process quantity | Effect | stability | Whether to run business code |
---|---|---|---|---|
Master | 1 | Process management, inter-process message forwarding | Very high | no |
Agent | 1 | Background Running Work (Long Connection Client) | high | A few |
Worker | Generally cpu Number | Execute business code | commonly | yes |
Roughly speaking, Master is used as the main thread, Agent is started as a secretarial process to assist Worker in handling some public affairs (such as logs), and Worker process is started to execute real business code.
Implementation of multi-process
Process-related code
Start with Master, which is considered to be the top-level process for the time being (there's actually a parent process, I'll talk about it later).
/** * start egg app * @method Egg#startCluster * @param {Object} options {@link Master} * @param {Function} callback start success callback */ exports.startCluster = function(options, callback) { new Master(options).ready(callback); };
Starting with Master's constructor
constructor(options) { super(); // Initialization parameters this.options = parseOptions(options); // See Manager and Messenger for details of management classes for worker processes this.workerManager = new Manager(); // Messageenger class, see Manager and Messenger for details this.messenger = new Messenger(this); // For details of setting a read event, see the get-ready npm package ready.mixin(this); // Is it a production environment? this.isProduction = isProduction(); this.agentWorkerIndex = 0; // Is it closed? this.closed = false; ... //Next we look at the callback function of read and the various events registered: this.ready(() => { // Set the start state to true this.isStarted = true; const stickyMsg = this.options.sticky ? ' with STICKY MODE!' : ''; this.logger.info('[master] %s started on %s (%sms)%s', frameworkPkg.name, this[APP_ADDRESS], Date.now() - startTime, stickyMsg); // Send egg-read to each process and trigger related events const action = 'egg-ready'; this.messenger.send({ action, to: 'parent', data: { port: this[REALPORT], address: this[APP_ADDRESS] } }); this.messenger.send({ action, to: 'app', data: this.options }); this.messenger.send({ action, to: 'agent', data: this.options }); // start check agent and worker status this.workerManager.startCheck(); }); // Register all kinds of events this.on('agent-exit', this.onAgentExit.bind(this)); this.on('agent-start', this.onAgentStart.bind(this)); ... // Check the port and Fork an Agent detectPort((err, port) => { ... this.forkAgentWorker(); } }); }
In summary, we can see that Master's constructor is mainly to initialize and register all kinds of corresponding events, and finally run the forkAgentWorker function. The key code of this function can be seen:
const agentWorkerFile = path.join(__dirname, 'agent_worker.js'); // Executing an Agent through child_process const agentWorker = childprocess.fork(agentWorkerFile, args, opt);
Continuing to agent_worker.js, agent_worker instantiates an agent object and agent_worker.js has a key code:
agent.ready(() => { agent.removeListener('error', startErrorHandler); // Clear up bug-listening events process.send({ action: 'agent-start', to: 'master' }); // Send an agent-start action to master });
As you can see, the code in agent_worker.js sends a message to the master, acting as agent-start, and then returning to Master, you can see that it registers two events, forkAppWorkers of once and onAgentStart of onAgent.
this.on('agent-start', this.onAgentStart.bind(this)); this.once('agent-start', this.forkAppWorkers.bind(this));
First look at the onAgentStart function, which is relatively simple, is the transfer of some information:
onAgentStart() { this.agentWorker.status = 'started'; // Send egg-ready when agent is started after launched if (this.isAllAppWorkerStarted) { this.messenger.send({ action: 'egg-ready', to: 'agent', data: this.options }); } this.messenger.send({ action: 'egg-pids', to: 'app', data: [ this.agentWorker.pid ] }); // should send current worker pids when agent restart if (this.isStarted) { this.messenger.send({ action: 'egg-pids', to: 'agent', data: this.workerManager.getListeningWorkerIds() }); } this.messenger.send({ action: 'agent-start', to: 'app' }); this.logger.info('[master] agent_worker#%s:%s started (%sms)', this.agentWorker.id, this.agentWorker.pid, Date.now() - this.agentStartTime); }
The forkAppWorkers function is then executed, mainly with the help of cfork Packet fork corresponds to the work process and registers a series of related monitoring events.
... cfork({ exec: this.getAppWorkerFile(), args, silent: false, count: this.options.workers, // don't refork in local env refork: this.isProduction, }); ... // Triggering app-start events cluster.on('listening', (worker, address) => { this.messenger.send({ action: 'app-start', data: { workerPid: worker.process.pid, address }, to: 'master', from: 'app', }); });
You can see that the forkAppWorkers function triggers app-start events on the master when Listening for Listening events.
this.on('app-start', this.onAppStart.bind(this)); ... // Master read callback trigger if (this.options.sticky) { this.startMasterSocketServer(err => { if (err) return this.ready(err); this.ready(true); }); } else { this.ready(true); } // ready callback sends egg-read status to each process const action = 'egg-ready'; this.messenger.send({ action, to: 'parent', data: { port: this[REALPORT], address: this[APP_ADDRESS] } }); this.messenger.send({ action, to: 'app', data: this.options }); this.messenger.send({ action, to: 'agent', data: this.options }); // start check agent and worker status if (this.isProduction) { this.workerManager.startCheck(); }
Conclusion:
- Master.constructor: Execute Master's constructor first, where a detect or function is executed
- Detect: Detect => forkAgentWorker()
- forkAgentWorker: Get the Agent process and trigger the agent-start event to the master
- Execute the onAgentStart function and the forkAppWorker function (once)
- OnAgentStart => Send various kinds of information, forkAppWorker => Trigger app-start event to master
- App-start event triggers onAppStart() method
- OnAppStart => Set ready (true) => Execute read callback function
- Ready () = > Send egg-read to each process and trigger related events to execute the startCheck() function
+---------+ +---------+ +---------+ | Master | | Agent | | Worker | +---------+ +----+----+ +----+----+ | fork agent | | +-------------------->| | | agent ready | | |<--------------------+ | | | fork worker | +----------------------------------------->| | worker ready | | |<-----------------------------------------+ | Egg ready | | +-------------------->| | | Egg ready | | +----------------------------------------->|
Process guard
According to official documents, process guardianship is largely dependent on graceful and egg-cluster These two libraries.
Uncaptured exception
- Close all TCP Server s of the exception Worker process (quickly disconnect existing connections and no longer receive new connections), disconnect the IPC channel of Master, and no longer accept new user requests.
- Master immediately fork s a new Worker process to keep the total number of workers online unchanged.
- The exception Worker waits for a period of time and exits after processing the accepted request.
+---------+ +---------+ | Worker | | Master | +---------+ +----+----+ | uncaughtException | +------------+ | | | | +---------+ | <----------+ | | Worker | | | +----+----+ | disconnect | fork a new worker | +-------------------------> + ---------------------> | | wait... | | | exit | | +-------------------------> | | | | | die | | | | | |
As you can see from the executed app file, apps actually inherit from Application Class, under which graceful() is called.
onServer(server) { ...... graceful({ server: [ server ], error: (err, throwErrorCount) => { ...... }, }); ...... }
Looking at graceful, you can see that it captures the process.on('uncaughtException') event, closes the TCP connection in the callback function, closes its own process, and disconnects the IPC channel with the master.
process.on('uncaughtException', function (err) { ...... // Setting Connection: close response header for http connections servers.forEach(function (server) { if (server instanceof http.Server) { server.on('request', function (req, res) { // Let http server set `Connection: close` header, and close the current request socket. req.shouldKeepAlive = false; res.shouldKeepAlive = false; if (!res._header) { res.setHeader('Connection', 'close'); } }); } }); // Set a timer function to close the child process and exit the process itself // make sure we close down within `killTimeout` seconds var killtimer = setTimeout(function () { console.error('[%s] [graceful:worker:%s] kill timeout, exit now.', Date(), process.pid); if (process.env.NODE_ENV !== 'test') { // kill children by SIGKILL before exit killChildren(function() { // Withdrawal from its own process process.exit(1); }); } }, killTimeout); // But don't keep the process open just for that! // If there is no more io waitting, just let process exit normally. if (typeof killtimer.unref === 'function') { // only worked on node 0.10+ killtimer.unref(); } var worker = options.worker || cluster.worker; // cluster mode if (worker) { try { // Close TCP Connection for (var i = 0; i < servers.length; i++) { var server = servers[i]; server.close(); } } catch (er1) { ...... } try { // Close ICP channel worker.disconnect(); } catch (er2) { ...... } } });
ok, after closing the IPC channel, we continue to look at the cfork file, the fork worker package mentioned above, which monitors the disconnect event of the sub-process, and he will judge whether to re-fork a new sub-process according to the conditions.
cluster.on('disconnect', function (worker) { ...... // Save the pid disconnects[worker.process.pid] = utility.logDate(); if (allow()) { // fork A New Subprocess newWorker = forkWorker(worker._clusterSettings); newWorker._clusterSettings = worker._clusterSettings; } else { ...... } });
Generally speaking, this time will continue to wait for a while and then execute the timer function mentioned above, that is, exit the process.
OOM,System exception
About thisSystem exception
, Sometimes in sub-processesCan't catch
Yes, we can only master Processing, that iscfork
Bag.cluster.on('exit', function (worker, code, signal) { // If it's a program exception, it fork s a subprocess again through the uncatughException mentioned above, so it's not needed here. var isExpected = !!disconnects[worker.process.pid]; if (isExpected) { delete disconnects[worker.process.pid]; // worker disconnect first, exit expected return; } // It's the master kills the child process, no fork is needed if (worker.disableRefork) { // worker is killed by master return; } if (allow()) { newWorker = forkWorker(worker._clusterSettings); newWorker._clusterSettings = worker._clusterSettings; } else { ...... } cluster.emit('unexpectedExit', worker, code, signal); });
Interprocess Communication (IPC)
As mentioned above, the IPC channel of cluster exists only between Master and Worker/Agent, and there is no interaction between Worker and Agent processes. So what should workers do to communicate with each other? Yes, through Master.
Broadcast message: agent => all workers +--------+ +-------+ | Master |<---------| Agent | +--------+ +-------+ / | \ / | \ / | \ / | \ v v v +----------+ +----------+ +----------+ | Worker 1 | | Worker 2 | | Worker 3 | +----------+ +----------+ +----------+ //Designated recipient: one worker => another worker +--------+ +-------+ | Master |----------| Agent | +--------+ +-------+ ^ | send to / | worker 2 / | / | / v +----------+ +----------+ +----------+ | Worker 1 | | Worker 2 | | Worker 3 | +----------+ +----------+ +----------+
In master, you can see that when agent s and app s are fork ed, they listen to their information and transform it into an object:
agentWorker.on('message', msg => { if (typeof msg === 'string') msg = { action: msg, data: msg }; msg.from = 'agent'; this.messenger.send(msg); }); worker.on('message', msg => { if (typeof msg === 'string') msg = { action: msg, data: msg }; msg.from = 'app'; this.messenger.send(msg); });
You can see that the last call is messenger.send, and messengeer.send determines where to send information based on from and to.
send(data) { if (!data.from) { data.from = 'master'; } ...... // app -> master // agent -> master if (data.to === 'master') { debug('%s -> master, data: %j', data.from, data); // app/agent to master this.sendToMaster(data); return; } // master -> parent // app -> parent // agent -> parent if (data.to === 'parent') { debug('%s -> parent, data: %j', data.from, data); this.sendToParent(data); return; } // parent -> master -> app // agent -> master -> app if (data.to === 'app') { debug('%s -> %s, data: %j', data.from, data.to, data); this.sendToAppWorker(data); return; } // parent -> master -> agent // App - > Master - > agent, may not specify to if (data.to === 'agent') { debug('%s -> %s, data: %j', data.from, data.to, data); this.sendToAgentWorker(data); return; } }
master is a registration event that emit s directly from action information.
sendToMaster(data) { this.master.emit(data.action, data.data); }
agent and worker, on the other hand, use a sendmessage package, which actually calls the following similar methods
// Passing information to subprocesses agent.send(data) worker.send(data)
Finally, the Messenger class is invoked on the base class EggApplication inherited by both agent and app. The constructors inside the class are as follows:
constructor() { super(); ...... this._onMessage = this._onMessage.bind(this); process.on('message', this._onMessage); } _onMessage(message) { if (message && is.string(message.action)) { // Register events corresponding to action information emit, just like master this.emit(message.action, message.data); } }
To sum up:
The idea is to use event mechanism and IPC channel to achieve communication between processes.
Other
There is a function of timeout.unref() in the process of learning. I recommend you to refer to this function. Answer to this question on the 6th floor
summary
From front-end thinking to back-end thinking is actually very laborious, plus Egg process management implementation is really very powerful, so spent a lot of time on various APIs and thinking.
Reference and Reference
Multiprocess Model and Interprocess Communication
Egg source code parsing egg-cluster