Node.js - 向child_process发送一个大对象很慢 [英] Node.js - Sending a big object to child_process is slow

查看:142
本文介绍了Node.js - 向child_process发送一个大对象很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的用例如下:
我从节点服务器到公共API进行了大量的休息API调用。有时响应很大,有时很小。我的用例要求我对响应JSON进行字符串化。我知道一个大的JSON,因为响应将阻止我的事件循环。经过一些研究后,我决定使用child_process.fork来解析这些响应,以便其他API调用不需要等待。我尝试从我的主进程发送一个30 MB的大JSON文件到分叉的child_process。子进程需要很长时间来挑选和解析json。我期待儿童过程的反应并不大。我只想字符串化并获得长度并发送回主进程。

My Use-case is as follows: I make plenty of rest API calls from my node server to public APIs. Sometime the response is big and sometimes its small. My use-case demands me to stringify the response JSON. I know a big JSON as response is going to block my event loop. After some research i decided to use child_process.fork for parsing these responses, so that the other API calls need not wait. I tried sending a big 30 MB JSON file from my main process to the forked child_process. It takes so long for the child process to pick and parse the json. The response im expecting from the child process is not huge. I just want to stringify and get the length and send back to the main process.

我附上主码和子码。

var moment = require('moment');
var fs = require('fs');
var process = require('child_process');
var request = require('request');

var start_time = moment.utc().valueOf();

request({url: 'http://localhost:9009/bigjson'}, function (err, resp, body) {

  if (!err && resp.statusCode == 200) {

    console.log('Body Length : ' + body.length);

    var ls = process.fork("response_handler.js", 0);

    ls.on('message', function (message) {
        console.log(moment.utc().valueOf() - start_time);
        console.log(message);
    });
    ls.on('close', function (code) {
        console.log('child process exited with code ' + code);
    });
    ls.on('error', function (err) {
        console.log('Error : ' + err);
    });
    ls.on('exit', function (code, signal) {
        console.log('Exit : code : ' + code + ' signal : ' + signal);
    });
  }
  ls.send({content: body});
});

response_handler.js

response_handler.js

console.log("Process " + process.argv[2] + " at work ");

process.on('message', function (json) {
  console.log('Before Parsing');
  var x = JSON.stringify(json);
  console.log('After Parsing');
  process.send({msg: 'Sending message from the child. total size is' +    x.length});
});

有没有更好的方法来实现我想做的事情?一方面,我需要node.js的强大功能,每秒进行1000次API调用,但有时候我会得到一个很大的JSON,这会让事情搞砸。

Is there a better way to achieve what im trying to do? On one hand i need the power of node.js to make 1000's of API calls per second, but sometimes i get a big JSON back which screws things up.

推荐答案

您的任务似乎都是IO绑定的(获取30MB大小的JSON),其中Node的异步性闪耀,以及CPU绑定(解析30MB大小的JSON),其中异步性对您没有帮助。

Your task seems to be both IO-bound (fetching 30MB sized JSON) where Node's asynchronicity shines, as well as CPU-bound (parsing 30MB sized JSON) where asynchronicity doesn't help you.

分叉太多进程很快就会成为资源匮乏并降低性能。对于CPU绑定的任务,您需要的核心数量与核心数量相同,而不是更多。

Forking too many processes soon becomes a resource hog and degrades performance. For CPU-bound tasks you need just as many processes as you have cores and no more.

我会使用一个单独的进程来获取和委托解析到其他N个进程,其中N(最多)CPU核心数减1并使用用于过程通信的某种形式的IPC。

I would use one separate process to do the fetching and delegate parsing to N other processes, where N is (at most) the number of your CPU cores minus 1 and use some form of IPC for the process communication.

一种选择是使用Node的Cluster模块来编排上述所有内容: https://nodejs.org/docs/latest/api/cluster.html

One choice is to use Node's Cluster module to orchestrate all of the above: https://nodejs.org/docs/latest/api/cluster.html

使用此模块,您可以有一个主进程预先创建你的工作进程,不需要担心什么时候分叉,创建多少进程等.IPC与往常一样工作 process.send process.on 。因此可能的工作流程是:

Using this module, you can have a master process create your worker processes upfront and don't need to worry when to fork, how many processes to create, etc. IPC works as usual with process.send and process.on. So a possible workflow is:


  1. 应用程序启动:主进程创建fetcher和N解析器进程。

  2. fetcher被发送一个API端点的工作列表来处理并开始获取JSON,将其发送回主进程。

  3. 在每个获取的JSON上发送一个主服务器发送给解析器进程。您可以以循环方式使用它们,或者在解析器工作队列为空或运行不足时使用更复杂的方式向主进程发送信号。

  4. 解析器进程将生成的JSON对象发送回master。

  1. Application startup: master process creates a "fetcher" and N "parser" processes.
  2. fetcher is sent a work list of API endpoints to process and starts fetching JSON sending it back to master process.
  3. on every JSON fetched the master sends to a parser process. You could use them in a round-robin fashion or use a more sophisticated way of signalling to the master process when a parser work queue is empty or is running low.
  4. parser processes send the resulting JSON object back to master.

请注意,IPC也有非平凡的开销,尤其是在发送/接收大型对象时。您甚至可以让提取器对非常小的响应进行解析,而不是传递它们以避免这种情况。这里的小可能是< 32KB。

Note that IPC also has non-trivial overhead, especially when send/receiving large objects. You could even have the fetcher do the parsing of very small responses instead of passing them around to avoid this. "Small" here is probably < 32KB.

参见:在节点中的进程之间发送数据是否昂贵/高效?

这篇关于Node.js - 向child_process发送一个大对象很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆