如何从Node.js应用发出数百万个并行HTTP请求? [英] How to make millions of parallel http requests from nodejs app?

查看:438
本文介绍了如何从Node.js应用发出数百万个并行HTTP请求?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须从我的nodejs应用程序发出一百万次http调用.

I have to make a million http calls from my nodejs app.

除了使用异步lib进行处理外,回调还有其他方法可以并行调用这些许多请求以更快地处理它吗?

Apart from doing it using async lib, callbacks is there any other way to call these many requests in parallel to process it much faster?

请在同一位置推荐我

推荐答案

正如您的问题的标题所要询问的那样,实际上发出数百万个并行请求有点愚蠢.同时执行多个请求并不能帮助您更快地完成工作,并且可能会耗尽许多系统资源(内存,套接字,带宽等).

As the title of your question seems to ask, it's a bit of a folly to actually make millions of parallel requests. Having that many requests in flight at the same time will not help you get the job done any quicker and it will likely exhaust many system resources (memory, sockets, bandwidth, etc...).

相反,如果目标是尽可能快地处理数百万个请求,那么您需要执行以下操作:

Instead, if the goal is to just process millions of requests as fast as possible, then you want to do the following:

  1. 启动足够的并行node.js进程,以便您使用所有可用于处理请求响应的CPU.如果该过程涉及的每个服务器中都有8个核心,则每个服务器启动8个node.js进程.

  1. Start up enough parallel node.js processes so that you are using all the CPU you have available for processing the request responses. If you have 8 cores in each server involved in the process, then start up 8 node.js processes per server.

安装尽可能多的网络带宽功能(高吞吐量连接,多个网卡等),以便您可以尽快进行网络连接.

Install as much networking bandwidth capability as possible (high throughput connection, multiple network cards, etc...) so you can do the networking as fast as possible.

对所有I/O使用异步I/O处理,因此您将尽可能高效地使用系统资源.请注意磁盘I/O,因为node.js中的异步磁盘I/O实际上使用了节点实现内部的有限线程池,因此您不会在同一时间实际运行无限数量的异步磁盘I/O请求时间.如果您尝试执行此操作,则不会收到错误消息(多余的请求将被排队),但也不会帮助您提高性能. node.js中的网络确实是异步的,因此它没有这个问题.

Use asynchronous I/O processing for all I/O so you are using the system resources as efficiently as possible. Be careful about disk I/O because async disk I/O in node.js actually uses a limited thread pool internal to the node implementation so you can't have an indefinite number of async disk I/O requests actually in flight at the same time. You won't get an error if you try to do this (the excess requests will just be queued), but it won't help you with performance either. Networking in node.js is truly async so it doesn't have this issue.

每个node.js进程仅打开尽可能多的并发请求,这实际上使您受益.它有多少(可能在2到20之间)取决于网络处理请求与CPU的总时间,以及响应速度有多慢.如果所有请求都发送到同一台远程服务器,那么用请求饱和它可能对您无济于事,因为您已经在要求它做尽可能多的事情.

Open only as many simultaneous requests per node.js process as actually benefit you. How many this is (likely somewhere between 2 and 20) depends upon how much of the total time to process a request is networking vs. CPU and how slow the responses are. If all the requests are going to the same remote server, then saturating it with requests likely won't help you either because you're already asking it to do as much as it can do.

在您的多个node.js进程之间创建一种协调机制,以提供每项工作并可能收集结果(通常使用工作队列之类的方法).

Create a coordination mechanism among your multiple node.js processes to feed each one work and possibly collect results (something like a work queue is often used).

疯狂测试,发现瓶颈所在,并研究如何调整或更改代码以减少瓶颈.

Test like crazy and discover where your bottlenecks are and investigate how to tune or change code to reduce the bottlenecks.

如果您的请求全部都发送到同一远程服务器,那么您将必须弄清楚它对多个请求的行为.如果一次触发10个请求,而一次触发100个请求,则较大的服务器场的行为可能不会有太大不同.但是,如果您一次触发100个请求,则单个较小的远程服务器的性能实际上可能会更差.如果您的请求全部发送给不同的主机,那么您根本就不会遇到这个问题.如果您的请求是混合使用不同的主机和相同的主机,则可能需要将它们分散到不同的主机,这样您就不会一次使同一主机获得100.

If your requests are all to the same remote server then you will have to figure out how it behaves with multiple requests. A larger server farm will probably not behave much differently if you fire 10 requests at it at once vs. 100 requests at once. But, a single smaller remote server might actually behave worse if you fire 100 requests at it at once. If your requests are all to different hosts, then you don't have this issue at all. If your requests are to a mixture of different hosts and same hosts, then it may pay to spread them around to different hosts so that you aren't making 100 at once of the same host.

其背后的基本思想是:

  1. 您希望最大限度地利用CPU,以便每个CPU始终尽其所能.

  1. You want to maximize your use of the CPU so each CPU is always doing as much as it can.

由于您的node.js代码是单线程的,因此每个内核需要一个node.js进程,以便最大程度地利用可用的CPU周期.在内核数量之外添加其他的node.js进程只会导致不必要的OS上下文切换成本,并且可能对性能没有帮助.

Since your node.js code is single threaded, you need one node.js process per core in order to maximize your use of the CPU cycles available. Adding additional node.js processes beyond the number of cores will just incur unnecessary OS context switching costs and probably not help performance.

您只需要在运行中同时执行足够的并行请求,即可使CPU正常工作.在运行中有很多多余的请求超出了为CPU供电所需的请求,这只会增加内存使用量,而无济于事.如果您有足够的内存来容纳多余的请求,则拥有更多的内存并没有什么害处,但这也无济于事.因此,理想情况下,您将每次运行中的请求设置为比保持CPU繁忙所需的请求多.

You only need enough parallel requests in flight at the same time to keep the CPU fed with work. Having lots of excess requests in flight beyond what is needed to feed the CPU just increases memory usage beyond what is helpful. If you have enough memory to hold the excess requests, it isn't harmful to have more, but it isn't helpful either. So, ideally you'd set things to have a few more requests in flight at a time than are needed to keep the CPU busy.

这篇关于如何从Node.js应用发出数百万个并行HTTP请求?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆