在Node.js中执行并行处理的最佳方法 [英] Best way to execute parallel processing in Node.js

查看:346
本文介绍了在Node.js中执行并行处理的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个小型节点应用程序,该应用程序将搜索并解析文件系统上的大量文件. 为了加快搜索速度,我们尝试使用某种类型的map reduce.该计划将是以下简化方案:

I'm trying to write a small node application that will search through and parse a large number of files on the file system. In order to speed up the search, we are attempting to use some sort of map reduce. The plan would be the following simplified scenario:

  • Web请求带有搜索查询
  • 启动3个进程,每个进程分配1000个(不同)文件
  • 一旦进程完成,它将返回"结果回到主线程
  • 所有进程完成后,主线程将通过将合并后的结果作为JSON结果返回来继续

我对此有以下疑问: 这在Node中可行吗? 推荐的做法是什么?

The questions I have with this are: Is this doable in Node? What is the recommended way of doing it?

我一直在摆弄,但是接下来的例子使用 Process :

I've been fiddling, but come no further then following example using Process:

启动器:

function Worker() { return child_process.fork("myProcess.js); }
for(var i = 0; i < require('os').cpus().length; i++){
        var process = new Worker();
        process.send(workItems.slice(i * itemsPerProcess, (i+1) * itemsPerProcess));
}

myProcess.js

process.on('message', function(msg) {
    var valuesToReturn = [];
    // Do file reading here
    //How would I return valuesToReturn?
    process.exit(0);
}

一些旁注:

  • 我知道进程数应取决于服务器上CPU的数量
  • 我也知道文件系统中的速度限制.在将其移至数据库或Lucene实例之前,请先考虑一下概念证明:-)

推荐答案

应该可行.举一个简单的例子:

Should be doable. As a simple example:

// parent.js
var child_process = require('child_process');

var numchild  = require('os').cpus().length;
var done      = 0;

for (var i = 0; i < numchild; i++){
  var child = child_process.fork('./child');
  child.send((i + 1) * 1000);
  child.on('message', function(message) {
    console.log('[parent] received message from child:', message);
    done++;
    if (done === numchild) {
      console.log('[parent] received all results');
      ...
    }
  });
}

// child.js
process.on('message', function(message) {
  console.log('[child] received message from server:', message);
  setTimeout(function() {
    process.send({
      child   : process.pid,
      result  : message + 1
    });
    process.disconnect();
  }, (0.5 + Math.random()) * 5000);
});

因此,父进程产生X个子进程,并向其传递消息.它还安装了一个事件处理程序,以侦听从子级发回的任何消息(例如,结果).

So the parent process spawns an X number of child processes and passes them a message. It also installs an event handler to listen for any messages sent back from the child (with the result, for instance).

子进程等待来自父进程的消息,然后开始处理(在这种情况下,它只是启动一个具有随机超时的计时器来模拟一些正在完成的工作).完成后,它将结果发送回父进程,并使用process.disconnect()使其自身与父进程断开连接(基本上停止子进程).

The child process waits for messages from the parent, and starts processing (in this case, it just starts a timer with a random timeout to simulate some work being done). Once it's done, it sends the result back to the parent process and uses process.disconnect() to disconnect itself from the parent (basically stopping the child process).

父进程会跟踪已启动的子进程的数量以及已发回结果的子进程的数量.当这些数字相等时,父级从子级进程接收所有结果,因此它可以合并所有结果并返回JSON结果.

The parent process keeps track of the number of child processes started, and the number of them that have sent back a result. When those numbers are equal, the parent received all results from the child processes so it can combine all results and return the JSON result.

这篇关于在Node.js中执行并行处理的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆