大量回调的 NodeJS 性能 [英] Performance of NodeJS with large amount of callbacks

查看:21
本文介绍了大量回调的 NodeJS 性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个 NodeJS 应用程序.有一个特定的 RESTful API (GET),当由用户触发时,它需要服务器执行大约 10-20 次网络操作才能从不同来源提取信息.所有这些网络操作都是异步回调,一旦它们全部完成,结果由 nodejs 应用程序整合并发回客户端.所有这些操作都是通过 async.map 函数并行启动的.

I am working on a NodeJS application. There is a specific RESTful API (GET) that, when triggered by the user, it requires the server to do about 10-20 network operations to pull information from different sources. All these network operations are async callbacks, and once they ALL are finished, the result is consolidated by the nodejs app and sent back to the client. All these operations are started in parallel via async.map function.

我只是想了解一下,由于nodejs是单线程的,并且它不使用多核机器(至少不是没有集群),当它有很多回调要处理时,节点如何扩展?回调的实际处理是否依赖于节点的单线程空闲,或者回调是否与主线程并行处理?

I just want to understand, since nodejs is single threaded, and it does not make use of multi-core machines (at least not without clustering), how does node scale when it has many callbacks to process? Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?

我问的原因是,我看到我的 20 个回调从第一个回调到最后一个回调的性能下降.例如,第一个网络操作(10-20 个中)需要 141 毫秒才能完成,而最后一个需要大约 4 秒(以函数执行到函数回调返回值或一个错误).它们都是同一个网络操作命中同一个数据源,所以数据源不是瓶颈).我知道数据源响应单个请求的时间不超过 200 毫秒.

The reason why I ask is, I see the performance of my 20 callbacks deteriorate from the first callback to the last one. For example, the first network operation (out of the 10-20) takes 141ms to complete, whereas the last one takes about 4 seconds (measured as the time from when the function is executed, until the callback of the function returns a value or an error). They are all the same network operation hitting the same data source, so the data source is not the bottleneck). I know for a fact that the data source takes no more than 200ms to respond to a single request.

我发现了这个线程,所以它看起来我认为单个线程需要处理所有回调和即将出现的新请求.

I found this thread, so it looks to me that the one single thread needs to address all callbacks AND new requests coming up.

所以我的问题是,对于会触发许多回调的操作,优化其性能的最佳实践是什么?

So my question is, for operations that will trigger many callbacks, what is the best practice in optimizing their performance?

推荐答案

对于网络操作,node.js 是有效的单线程.然而,一直存在一种误解,即处理 I/O 需要恒定的 CPU 资源.你的问题的核心归结为:

For network operations node.js is effectively single threaded. However there is a persistent misunderstanding that handling I/O requires constant CPU resource. The core of your question boil down to:

回调的实际处理是否依赖于节点的单线程空闲,或者回调是否与主线程并行处理?

Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?

答案是肯定的,也不是.是的,回调仅在主线程空闲时执行.不,线程空闲时不进行处理".具体来说:没有处理"——如果您所说的处理"正在等待,则节点处理"数千个回调所需的 CPU 时间为零.

The answer is yes and no. Yes, callbacks are only executed when the main thread is idle. No, the "processing" is not done when thread is idle. To be specific: there is no "processing" - it takes zero CPU time for node to "process" thousands of callbacks if what you mean by "process" is waiting.

如果我们真的需要了解节点(或浏览器)内部是如何工作的,不幸的是,我们必须首先了解计算机是如何工作的——从硬件到操作系统.是的,这将是一次深潜,所以请耐心等待..

If we really need to understand how node (or browser) internals work we must unfortunately first understand how computers work - from the hardware to the operating system. Yes, this is going to be a deep dive so bear with me..

这一切都始于中断的发明..

It all began with the invention of interrupts..

这是一项伟大的发明,但也是一盒潘多拉 - Edsger Dijkstra

It was a great invention, but also a Box of Pandora - Edsger Dijkstra

是的,上面的引用来自同一个Goto 被认为有害"Dijkstra.从一开始,将异步操作引入计算机硬件就被认为是一个非常困难的话题,即使对于行业中的一些传奇人物来说也是如此.

Yes, the quote above is from the same "Goto considered harmful" Dijkstra. From the very beginning introducing asynchronous operation to computer hardware was considered a very hard topic even for some of the legends in the industry.

引入中断是为了加速 I/O 操作.不需要用软件轮询某些输入(占用 CPU 时间远离有用的工作),硬件会向 CPU 发送一个信号,告诉它一个事件发生了.然后 CPU 将挂起当前正在运行的程序并执行另一个程序来处理中断——因此我们称这些函数为中断处理程序.并且处理程序"这个词一直在堆栈中一直停留在调用回调函数事件处理程序"的 GUI 库中.

Interrupts was introduced to speed up I/O operations. Rather than needing to poll some input with software (taking CPU time away from useful work) the hardware will send a signal to the CPU to tell it an event has occurred. The CPU will then suspend the currently running program and execute another program to handle the interrupt - thus we call these functions interrupt handlers. And the word "handler" has stuck all the way up the stack to GUI libraries which call callback functions "event handlers".

如果您一直在注意,您会注意到中断处理程序的这个概念实际上是一个回调.将 CPU 配置为在稍后发生事件时调用函数.所以即使是回调也不是一个新概念——它比 C 更古老.

If you've been paying attention you will notice that this concept of an interrupt handler is actually a callback. You configure the CPU to call a function at some later time when an event happens. So even callbacks are not a new concept - it's way older than C.

中断使现代操作系统成为可能.如果没有中断,CPU 将无法暂时停止您的程序运行操作系统(好吧,有协作多任务处理,但现在让我们忽略它).操作系统的工作原理是它在 CPU 中设置一个硬件定时器来触发中断,然后它告诉 CPU 执行您的程序.正是这种周期性的定时器中断运行您的操作系统.除了定时器,操作系统(或者更确切地说是设备驱动程序)为 I/O 设置中断.当 I/O 事件发生时,操作系统将接管您的 CPU(或多核系统中的 CPU 之一)并检查其数据结构,它接下来需要执行哪个进程来处理 I/O(这称为抢占式多任务处理).

Interrupts make modern operating systems possible. Without interrupts there would be no way for the CPU to temporarily stop your program to run the OS (well, there is cooperative multitasking, but let's ignore that for now). How an OS works is that it sets up a hardware timer in the CPU to trigger an interrupt and then it tells the CPU to execute your program. It is this periodic timer interrupt that runs your OS. Apart form the timer, the OS (or rather device drivers) sets up interrupts for I/O. When an I/O event happens the OS will take over your CPU (or one of your CPU in a multi-core system) and checks against its data structure which process it needs to execute next to handle the I/O (this is called preemptive multitasking).

因此,处理网络连接甚至不是操作系统的工作——操作系统只是在其数据结构(或者更确切地说,网络堆栈)中跟踪连接.真正处理网络 I/O 的是您的网卡、路由器、调制解调器、ISP 等.因此等待 I/O 占用的 CPU 资源为零.它只是占用一些 RAM 来记住哪个程序拥有哪个套接字.

So, handling network connections is not even the job of the OS - the OS just keeps track of connections in it's data structures (or rather, the networking stack). What really handles network I/O is your network card, your router, your modem, your ISP etc. So waiting for I/O takes zero CPU resources. It just takes up some RAM to remember which program owns which socket.

既然我们已经清楚地了解了这一点,我们就可以理解该节点的作用了.各种操作系统都有各种不同的 API 来提供异步 I/O——从 Windows 上的重叠 I/O 到 Linux 上的轮询/epoll,再到 BSD 上的 kqueue,再到跨平台的 select().Node 在内部使用 libuv 作为对这些 API 的高级抽象.

Now that we have a clear picture of this we can understand what it is that node does. Various OSes have various different APIs that provide asynchronous I/O - from overlapped I/O on Windows to poll/epoll on Linux to kqueue on BSD to the cross-platform select(). Node internally uses libuv as a high-level abstraction over these APIs.

这些 API 的工作方式相似,但细节不同.本质上,它们提供了一个函数,当调用该函数时将阻塞您的线程,直到操作系统向其发送事件.所以是的,即使是非阻塞 I/O 也会阻塞你的线程.这里的关键是,阻塞 I/O 会在多个地方阻塞您的线程,但非阻塞 I/O 只会在一个地方阻塞您的线程——在那里您等待事件.

How these APIs work are similar though the details differ. Essentially they provide a function that when called will block your thread until the OS sends an event to it. So yes, even non-blocking I/O blocks your thread. The key here is that blocking I/O will block your thread in multiple places but non-blocking I/O blocks your thread in only one place - where you wait for events.

这允许您以面向事件的方式设计您的程序.这类似于中断允许 OS 设计人员实现多任务处理的方式.实际上,异步 I/O 之于框架就像中断之于操作系统一样.它允许节点花费恰好 0% 的 CPU 时间来处理(等待)I/O.这就是让 node 变得更快的原因——它并不是真的更快,但不会浪费时间等待.

What this allows you to do is design your program in an event-oriented manner. This is similar to how interrupts allow OS designers to implement multitasking. In effect, asynchronous I/O is to frameworks what interrupts are to OSes. It allows node to spend exactly 0% CPU time to process (wait for) I/O. This is what makes node fast - it's not really faster but does not waste time waiting.

通过了解我们现在对节点如何处理网络 I/O 的了解,我们可以了解回调如何影响性能.

With the understanding we now have of how node handles network I/O we can understand how callbacks affect performance.

  1. 等待数千个回调的 CPU 损失为零

  1. There is zero CPU penalty having thousands of callbacks waiting

当然,节点仍然需要在 RAM 中维护数据结构以跟踪所有回调,因此回调确实有内存损失.

Of course, node still needs to maintain data structures in RAM to keep track of all the callbacks so callbacks do have memory penalty.

在单个线程中处理回调的返回值

Processing the return value from callbacks is done in a single thread

这有一些优点和一些缺点.这意味着节点不必担心竞争条件,因此节点不会在内部使用任何信号量或互斥锁来保护数据访问.缺点是任何 CPU 密集型 javascript 都会阻止所有其他操作.

This has some advantages and some drawbacks. It means node does not have to worry about race conditions and thus node does not internally use any semaphores or mutexes to guard data access. The disadvantage is that any CPU intensive javascript will block all other operations.

你提到:

我看到我的 20 个回调的性能从第一个回调到最后一个下降

I see the performance of my 20 callbacks deteriorate from the first callback to the last one

回调都是在主线程中顺序同步执行的(实际上只有等待是并行完成的).因此,您的回调可能正在执行一些 CPU 密集型计算,并且所有回调的总执行时间实际上是 4 秒.

The callbacks are all executed sequentially and synchronously in the main thread (only the waiting is actually done in parallel). Thus it could be that your callback is doing some CPU intensive calculations and the total execution time of all callbacks is actually 4 seconds.

但是,对于这么多的回调,我很少看到这种问题.仍然有可能,我仍然不知道您在回调中在做什么.我只是觉得不太可能.

However, I rarely see this kind of issue for that number of callbacks. It's still possible, I still don't know what you're doing in your callbacks. I just think it's unlikely.

您还提到:

直到函数的回调返回一个值或错误

until the callback of the function returns a value or an error

一种可能的解释是您的网络资源无法处理那么多同时连接.您可能认为这并不多,因为它只有 20 个连接,但我见过很多服务会以 10 个请求/秒的速度崩溃.问题是所有 20 个请求都是同时进行的.

One likely explanation is that your network resource cannot handle that many simultaneous connections. You may not think it's much since it's only 20 connections but I've seen plenty of services that would crash at 10 requests/second. The problem is all 20 requests are simultaneous.

您可以通过从图片中取出节点并使用命令行工具同时发送 20 个请求来测试这一点.类似于 curlwget:

You can test this by taking node out of the picture and use a command line tool to send 20 simultaneous requests. Something like curl or wget:

# assuming you're running bash:
for x in `seq 1 20`;do curl -o /dev/null -w "Connect: %{time_connect} Start: %{time_starttransfer} Total: %{time_total} 
" http://example.com & done

缓解

如果事实证明问题是同时执行 20 个请求会给其他服务带来压力,那么您可以做的是限制同时请求的数量.

Mitigation

If it turns out that the issue is doing the 20 requests simultaneously is stressing the other service what you can do is limit the number of simultaneous requests.

您可以通过批处理请求来做到这一点:

You can do this by batching your requests:

async function () {
    let input = [/* some values we need to process */];
    let result = [];

    while (input.length) {
        let batch = input.splice(0,3); // make 3 requests in parallel

        let batchResult = await Promise.all(batch.map(x => {
            return fetchNetworkResource(x);
        }));

        result = result.concat(batchResult);
    }
    return result;
}

这篇关于大量回调的 NodeJS 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆