如何使用并行子进程来执行“工作”在大阵列上? [英] How to use parallel child processes to perform "work" on a large array?

查看:208
本文介绍了如何使用并行子进程来执行“工作”在大阵列上?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个庞大的数字数组。我想使用JavaScript / Node.js计算所有数字的总和。 (为了这个问题的目的是一个简单的总和;在现实中,我有一个更复杂和冗长的数学运算来执行)。



在单线程世界中,计算和需要很长时间。为了更快地结束结果,我一直试图将工作委托给并行运行的多个子进程。



我的两个脚本如下:



index.js

  function computeSum(data){
var start = new Date();
var sum = data.reduce(function(a,b){return a + b;});
console.log(Sum =%s,Time =%s ms,sum,new Date()。getTime() - start.getTime());
}

function computeSumConcurrent(data){
var child_process = require(child_process);
var os = require(os);
var cpuCount = os.cpus()。length;
var subArraySize = data.length / cpuCount;
var childProcessesFinished = 0;
var start = new Date();
var sum = 0;

for(var i = 0; i var childProcess = child_process.fork(child.js);

childProcess.on(message,function(message){
sum + = message.sum;
childProcessesFinished ++;

if(childProcessesFinished = = cpuCount){
console.log(Sum =%s,Time =%s ms,sum,new Date()。getTime() - start.getTime());
process.exit ();
}
});

childProcess.send({subArray:data.slice(subArraySize * i,subArraySize *(i + 1))});
}
}

console.log(Populating array ...);
var data = []
for(var i = 0; i <50000000; i ++){
data.push(Math.random());
}

console.log(计算总和不使用子进程...);
computeSum(data);

console.log(使用子进程计算和...);
computeSumConcurrent(data);

child.js


$ b b

  process.on(message,function(message){
var sum = message.subArray.reduce(function(a,b){return a + b ;});
process.send({sum:sum});
process.exit();
});

如果你运行index.js,你会发现并行和非常慢。我认为这可能是由于 childProcess.send ,其中并不意味着沟通大块数据,但我不完全确定。



那么这种事情的解决方案是什么?

解决方案

为小型工作和发送和接收创建子进程消息事实上可以增加持续时间或处理,因为存在发送和接收消息所需的时间。



还有另一个问题,在你的代码中,你实际上是将子进程的工作从主进程本身分离,这不会使你的主进程



我建议你另一种方法。



  1. 创建子进程发送所有数据。


  2. <分配工作给他们。然后计算所有结果。


  3. 将最终结果发送给父级。



请注意



  1. 确保您不要创建太多的孩子,并为他们分配非常小的工作。这将只会让您发送和接收太多的
    消息,因此会延迟处理。


  2. 他们自己需要太多时间处理给定的任务。


  3. 你必须确保任何子进程不退出, / p>



好处:
分叉子数将使您的任务在相对较短的时间内完成(我不会说很短的持续时间)


随时询问您是否需要一些例子。


I have a huge array of numbers. I want to compute a sum of all of the numbers using JavaScript / Node.js. (For the purposes of this question it's a simple sum; in reality I have a much more complex and lengthy mathematical operation to perform).

In a single-threaded world, computing the sum takes a long time. To crunch the result quicker, I've been trying to delegate the work to multiple child processes running in parallel. Each child process determines the sum of a sub-array, and everything is totalled in the parent process.

My two scripts are below:

index.js

function computeSum(data) {
  var start = new Date();
  var sum = data.reduce(function(a, b) { return a + b; });
  console.log("Sum = %s, Time = %s ms", sum, new Date().getTime() - start.getTime());
}

function computeSumConcurrent(data) {
  var child_process = require("child_process");
  var os = require("os");
  var cpuCount = os.cpus().length;
  var subArraySize = data.length / cpuCount;
  var childProcessesFinished = 0;
  var start = new Date();
  var sum = 0;

  for (var i = 0; i < cpuCount; i++) {
    var childProcess = child_process.fork("child.js");

    childProcess.on("message", function(message) {
      sum += message.sum;
      childProcessesFinished++;

      if (childProcessesFinished == cpuCount) {
        console.log("Sum = %s, Time = %s ms", sum, new Date().getTime() - start.getTime());
        process.exit();
      }
    });

    childProcess.send({ subArray: data.slice(subArraySize * i, subArraySize * (i + 1)) });
  }
}

console.log("Populating array...");
var data = []
for (var i = 0; i < 50000000; i++) {
  data.push(Math.random());
}

console.log("Computing sum without using child processes...");
computeSum(data);

console.log("Computing sum using child processes...");
computeSumConcurrent(data);

child.js

process.on("message", function(message) {
  var sum = message.subArray.reduce(function(a, b) { return a + b; });
  process.send({ sum: sum });
  process.exit();
});

If you run index.js, you'll observe that the parallel sum is extremely slow. I think it may be due to childProcess.send, which isn't meant to communicate large chunks of data, but I'm not entirely sure.

So what's the solution for this sort of thing? How can I make the parallel sum quicker than the single-threaded one?

解决方案

Creating child processes for small work and sending and receiving messages can in fact increase the duration or processing as there is time required for sending and receiving messages.

There is another problem as well, in your code, that you are actually dividing the work for children from the main process itself, this is not going to make your main process free from work, but only increase it more.

I would suggest you another approach.

  1. create a child process send it all the data.

  2. let the child itself create other children and assign work to them. and then calculate all the results.

  3. Send the final result to the parent.

Note:

  1. make sure you don't create too many children and assign very small work to them. this will only make you send and receive too many messages, and thus will delay the processing.

  2. Also don't create too less child forks that they themselves require too much of time processing the given task.

  3. you will have to make sure any child process doesnt exit before the children it has sreated itself

Benefits:

  1. your main process will not be busy in dividing tasks and calculating the result.
  2. A fine number of children forked will make your task complete in relatively short duration (I don't say very short duration)

Feel free to ask if you need some example.

这篇关于如何使用并行子进程来执行“工作”在大阵列上?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆