通常,Node.js 如何处理 10,000 个并发请求? [英] How, in general, does Node.js handle 10,000 concurrent requests?

查看:26
本文介绍了通常,Node.js 如何处理 10,000 个并发请求?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 Node.js 使用单线程和事件循环来处理请求,一次只处理一个(非阻塞).但是,它是如何工作的,比如说 10,000 个并发请求.事件循环会处理所有的请求吗?那会不会花太长时间?

I understand that Node.js uses a single-thread and an event loop to process requests only processing one at a time (which is non-blocking). But still, how does that work, lets say 10,000 concurrent requests. The event loop will process all the requests? Would not that take too long?

我(还)无法理解它是如何比多线程 Web 服务器更快的.我知道多线程 Web 服务器在资源(内存、CPU)方面会更昂贵,但它不会更快吗?我可能错了;请解释这个单线程如何在大量请求中更快,以及它在处理大量请求(如 10,000)时通常会做什么(在高级别).

I can not understand (yet) how it can be faster than a multi-threaded web server. I understand that multi-threaded web server will be more expensive in resources (memory, CPU), but would not it still be faster? I am probably wrong; please explain how this single-thread is faster in lots of requests, and what it typically does (in high level) when servicing lots of requests like 10,000.

而且,那个单线程能很好地扩展这么大的数量吗?请记住,我刚刚开始学习 Node.js.

And also, will that single-thread scale well with that large amount? Please bear in mind that I am just starting to learn Node.js.

推荐答案

如果您必须问这个问题,那么您可能不熟悉大多数 Web 应用程序/服务的作用.您可能认为所有软件都这样做:

If you have to ask this question then you're probably unfamiliar with what most web applications/services do. You're probably thinking that all software do this:

user do an action
       │
       v
 application start processing action
   └──> loop ...
          └──> busy processing
 end loop
   └──> send result to user

然而,这不是 Web 应用程序或任何以数据库为后端的应用程序的工作方式.网络应用会这样做:

However, this is not how web applications, or indeed any application with a database as the back-end, work. Web apps do this:

user do an action
       │
       v
 application start processing action
   └──> make database request
          └──> do nothing until request completes
 request complete
   └──> send result to user

在这种情况下,软件的大部分运行时间都使用 0% 的 CPU 时间等待数据库返回.

In this scenario, the software spend most of its running time using 0% CPU time waiting for the database to return.

多线程网络应用程序像这样处理上述工作负载:

Multithreaded network apps handle the above workload like this:

request ──> spawn thread
              └──> wait for database request
                     └──> answer request
request ──> spawn thread
              └──> wait for database request
                     └──> answer request
request ──> spawn thread
              └──> wait for database request
                     └──> answer request

所以线程大部分时间都在使用 0% CPU 等待数据库返回数据.在这样做时,他们不得不为一个线程分配所需的内存,该线程包括为每个线程等提供一个完全独立的程序堆栈.此外,他们必须启动一个线程,虽然它不像启动一个完整进程那样昂贵,但仍然不完全是便宜.

So the thread spend most of their time using 0% CPU waiting for the database to return data. While doing so they have had to allocate the memory required for a thread which includes a completely separate program stack for each thread etc. Also, they would have to start a thread which while is not as expensive as starting a full process is still not exactly cheap.

既然我们大部分时间都在使用 0% 的 CPU,那么为什么不在不使用 CPU 的情况下运行一些代码呢?这样,每个请求仍将获得与多线程应用程序相同的 CPU 时间,但我们不需要启动线程.所以我们这样做:

Since we spend most of our time using 0% CPU, why not run some code when we're not using CPU? That way, each request will still get the same amount of CPU time as multithreaded applications but we don't need to start a thread. So we do this:

request ──> make database request
request ──> make database request
request ──> make database request
database request complete ──> send response
database request complete ──> send response
database request complete ──> send response

实际上,这两种方法都以大致相同的延迟返回数据,因为主要处理的是数据库响应时间.

In practice both approaches return data with roughly the same latency since it's the database response time that dominates the processing.

这里的主要优点是我们不需要产生一个新线程,所以我们不需要做很多会减慢我们速度的 malloc.

The main advantage here is that we don't need to spawn a new thread so we don't need to do lots and lots of malloc which would slow us down.

看似神秘的事情是上述两种方法如何设法并行"运行工作负载?答案是数据库是线程化的.所以我们的单线程应用实际上是在利用另一个进程的多线程行为:数据库.

The seemingly mysterious thing is how both the approaches above manage to run workload in "parallel"? The answer is that the database is threaded. So our single-threaded app is actually leveraging the multi-threaded behaviour of another process: the database.

如果您需要在返回数据之前进行大量 CPU 计算,则单线程应用程序会失败.现在,我不是指处理数据库结果的 for 循环.这仍然主要是 O(n).我的意思是做傅立叶变换(例如 mp3 编码)、光线追踪(3D 渲染)等.

A singlethreaded app fails big if you need to do lots of CPU calculations before returning the data. Now, I don't mean a for loop processing the database result. That's still mostly O(n). What I mean is things like doing Fourier transform (mp3 encoding for example), ray tracing (3D rendering) etc.

单线程应用程序的另一个缺陷是它只会使用一个 CPU 内核.因此,如果您有一个四核服务器(现在并不少见),您就不会使用其他 3 个内核.

Another pitfall of singlethreaded apps is that it will only utilise a single CPU core. So if you have a quad-core server (not uncommon nowdays) you're not using the other 3 cores.

如果您需要为每个线程分配大量 RAM,则多线程应用程序会失败.首先,RAM 使用本身意味着您无法处理与单线程应用程序一样多的请求.更糟糕的是,malloc 很慢.分配大量对象(这在现代 Web 框架中很常见)意味着我们可能最终比单线程应用程序慢.这是 node.js 通常获胜的地方.

A multithreaded app fails big if you need to allocate lots of RAM per thread. First, the RAM usage itself means you can't handle as many requests as a singlethreaded app. Worse, malloc is slow. Allocating lots and lots of objects (which is common for modern web frameworks) means we can potentially end up being slower than singlethreaded apps. This is where node.js usually win.

最终使多线程变得更糟的一个用例是当您需要在线程中运行另一种脚本语言时.首先,您通常需要为该语言分配整个运行时,然后您需要对脚本使用的变量进行分配.

One use-case that end up making multithreaded worse is when you need to run another scripting language in your thread. First you usually need to malloc the entire runtime for that language, then you need to malloc the variables used by your script.

因此,如果您使用 C 或 go 或 java 编写网络应用程序,那么线程的开销通常不会太糟糕.如果您正在编写一个 C 网络服务器来为 PHP 或 Ruby 提供服务,那么使用 javascript、Ruby 或 Python 编写一个更快的服务器非常容易.

So if you're writing network apps in C or go or java then the overhead of threading will usually not be too bad. If you're writing a C web server to serve PHP or Ruby then it's very easy to write a faster server in javascript or Ruby or Python.

某些网络服务器使用混合方法.例如,Nginx 和 Apache2 将它们的网络处理代码实现为事件循环的线程池.每个线程运行一个事件循环同时处理单线程请求,但请求在多个线程之间进行负载平衡.

Some web servers use a hybrid approach. Nginx and Apache2 for example implement their network processing code as a thread pool of event loops. Each thread runs an event loop simultaneously processing requests single-threaded but requests are load-balanced among multiple threads.

一些单线程架构也使用混合方法.您可以启动多个应用程序,而不是从单个进程启动多个线程 - 例如,四核机器上的 4 个 node.js 服务器.然后您使用负载平衡器在进程之间分配工作负载.

Some single-threaded architectures also use a hybrid approach. Instead of launching multiple threads from a single process you can launch multiple applications - for example, 4 node.js servers on a quad-core machine. Then you use a load balancer to spread the workload amongst the processes.

实际上,这两种方法在技术上是相同的镜像.

In effect the two approaches are technically identical mirror-images of each other.

这篇关于通常,Node.js 如何处理 10,000 个并发请求?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆