在Node中的进程之间发送数据是否昂贵/高效? [英] Is it expensive/efficient to send data between processes in Node?

查看:77
本文介绍了在Node中的进程之间发送数据是否昂贵/高效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Node允许您生成子进程并在它们之间发送数据。您可以使用它执行一些阻止代码,例如。

Node allows you to spawn child processes and send data between them. You could use it do execute some blocking code for example.

文档说这些子节点仍然是V8的全新实例。假设每个新节点至少有30ms启动和10mb内存。也就是说,你不能创建数千个其中。

Documentation says "These child Nodes are still whole new instances of V8. Assume at least 30ms startup and 10mb memory for each new Node. That is, you cannot create many thousands of them."

我想知道它是否有效,我是否应该担心一些限制?以下是示例代码:

I was wondering if is it efficient, should I worry about some limitations? Here's example code:

//index.js
var childProcess1 = childProcess.fork('./child1.js');

childProcess1.send(largeArray);

childProcess1.once('message', function(formattedData) {
  console.log(formattedData);
  return false;
});



//child1.js
process.on('message', function(data) {

  data = format(data); //do smth with data, then send it back to index.js

  try{
    process.send(data);
    return false;
  }
  catch(err){
    console.log(err);
    return false;
  }

});


推荐答案

文档告诉您启动新节点进程是(相对)昂贵。每次你需要工作时, fork()是不明智的。

The documentation is telling you that starting new node processes is (relatively) expensive. It is unwise to fork() every time you need to do work.

相反,你应该维护一个池长期运行的工作流程–很像一个线程池。在主进程中排队工作请求,并在它空闲时将它们分派给下一个可用的工作程序。

Instead, you should maintain a pool of long-running worker processes – much like a thread pool. Queue work requests in your main process and dispatch them to the next available worker when it goes idle.

这使我们对节点的IPC机制的性能配置文件提出疑问。当您 fork()时,节点会自动在子进程上设置一个特殊的文件描述符。它使用它通过读取和写入行分隔的JSON在进程之间进行通信。基本上,当你 process.send({...})时,节点 JSON.stringify s并写入序列化字符串到fd。接收过程读取此数据直到换行,然后 JSON.parse s。

This leaves us with a question about the performance profile of node's IPC mechanism. When you fork(), node automatically sets up a special file descriptor on the child process. It uses this to communicate between processes by reading and writing line-delimited JSON. Basically, when you process.send({ ... }), node JSON.stringifys it and writes the serialized string to the fd. The receiving process reads this data until hitting a line break, then JSON.parses it.

这必然意味着性能将高度依赖于您在进程之间发送的数据的大小。

This necessarily means that performance will be highly dependent on the size of the data you send between processes.

我已经粗略地进行了一些测试,以便更好地了解这种性能是什么样的。

I've roughed out some tests to get a better idea of what this performance looks like.

首先,我向工作人员发送了一个N字节的消息,它立即响应了相同长度的消息。我在我的四核超线程i7上尝试了1到8个并发工作程序。

First, I sent a message of N bytes to the worker, which immediately responded with a message of the same length. I tried this with 1 to 8 concurrent workers on my quad-core hyper-threaded i7.

我们可以看到至少有2名工人对原始吞吐量有利,但超过2名基本无关紧要。

We can see that having at least 2 workers is beneficial for raw throughput, but more than 2 essentially doesn't matter.

接下来,我向工作人员发送了一条空消息,该消息立即响应了N个字节的消息。

Next, I sent an empty message to the worker, which immediately responded with a message of N bytes.

令人惊讶的是,这没有什么区别。

Surprisingly, this made no difference.

最后,我尝试向工作人员发送一个N字节的消息,该消息立即以空消息回复。

Finally, I tried sending a message of N bytes to the worker, which immediately responded with an empty message.

有趣—性能不会因较大的消息而迅速降低。

Interesting — performance does not degrade as rapidly with larger messages.


  • 接收大邮件比发送邮件略贵。为获得最佳吞吐量,主进程不应发送大于1 kB的邮件,也不应接收大于128字节的邮件。

  • Receiving large messages is slightly more expensive than sending them. For best throughput, your master process should not send messages larger than 1 kB and should not receive messages back larger than 128 bytes.

对于小消息,IPC开销约为0.02毫秒。这个小到足以在现实世界中无关紧要。

For small messages, the IPC overhead is about 0.02ms. This is small enough to be inconsequential in the real world.

重要的是要意识到消息的序列化是一个同步的阻塞调用;如果开销太大,则在发送消息时将冻结整个节点进程。这意味着I / O将被饿死,您将无法处理任何其他事件(如传入的HTTP请求)。那么可以通过节点IPC发送的最大数据量是多少?

It is important to realize that the serialization of the message is a synchronous, blocking call; if the overhead is too large, your entire node process will be frozen while the message is sent. This means I/O will be starved and you will be unable to process any other events (like incoming HTTP requests). So what is the maximum amount of data that can be sent over node IPC?

事情变得非常讨厌超过32 kB。 (这些是每封邮件;双倍开头往返。)

Things get really nasty over 32 kB. (These are per-message; double to get roundtrip overhead.)

故事的寓意是你应该:


  • 如果输入大于32 kB,找到让工人获取实际数据集的方法。如果您从数据库或其他网络位置提取数据,请在工作人员中执行请求。不要让主数据获取数据,然后尝试在消息中发送它。该消息应该只包含工作人员完成其工作的足够信息。考虑函数参数之类的消息。

  • If the input is larger than 32 kB, find a way to have your worker fetch the actual dataset. If you're pulling the data from a database or some other network location, do the request in the worker. Don't have the master fetch the data and then try to send it in a message. The message should contain only enough information for the worker to do its job. Think of messages like function parameters.

如果输出大于32 kB,找到让工作人员在消息之外传递结果的方法。写入磁盘或将套接字发送给工作人员,以便您可以直接从工作进程响应。

If the output is larger than 32 kB, find a way to have the worker deliver the result outside of a message. Write to disk or send the socket to the worker so that you can respond directly from the worker process.

这篇关于在Node中的进程之间发送数据是否昂贵/高效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆