涉及HTTP调用的Node.js性能优化 [英] Node.js performance optimization involving HTTP calls

查看:113
本文介绍了涉及HTTP调用的Node.js性能优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Node.js应用程序,该应用程序打开一个文件,扫描每一行,并进行涉及Couchbase的REST调用.文件中的平均行数约为12至1300万.目前,没有任何特殊设置,我的应用程序可以在约24分钟内完全处理约100万条记录.我经历了很多问题,文章和Node文档,但找不到有关以下内容的任何信息:

I have a Node.js application which opens a file, scans each line and makes a REST call that involves Couchbase for each line. The average number of lines in a file is about 12 to 13 million. Currently without any special settings my app can completely process ~1 million records in ~24 minutes. I went through a lot of questions, articles, and Node docs but couldn't find out any information about following:

  1. 设置说节点可以同时打开X个http连接/套接字的设置在哪里?我可以更改它吗?
  2. 我必须规范文件处理,因为文件读取比REST调用快得多,所以过了一会儿,打开的REST请求太多,并且阻塞了系统,导致内存不足.所以现在我阅读了1000行等待REST调用完成,然后再恢复(我正在使用流上的pause和resume方法来完成此操作)是否有更好的替代方法?
  3. 我可以执行所有可能的优化以使其变得比此更快.我知道与gc相关的配置,可以防止应用程序频繁停顿.
  4. 是否建议使用集群"模块?它可以无缝工作吗?

背景:我们有一个现有的Java应用程序,它通过产生100个线程来实现完全相同的工作,并且与当前的节点应用程序相比,它能够实现稍微更好的吞吐量.但是我想尝试使用节点,因为有问题的两个操作(读取文件并为每行进行REST调用)对于节点应用程序来说似乎是完美的情况,因为它们在节点中都可以是异步的,而Java应用程序会阻止这些操作. ..

Background: We have an existing java application that does exactly same by spawning 100 threads and it is able to achieve slightly better throughput than the current node counterpart. But I want to try node since the two operations in question (reading a file and making a REST call for each line) seem like perfect situation for node app since they both can be async in node where as Java app makes blocking calls for these...

任何帮助将不胜感激...

Any help would be greatly appreciated...

推荐答案

通常,您应该将有关Stack Overflow的问题分解为几部分.由于您的问题都在同一件事上,我将回答它们.首先,让我从底部开始:

Generally you should break your questions on Stack Overflow into pieces. Since your questions are all getting at the same thing, I will answer them. First, let me start with the bottom:

我们有一个现有的Java应用程序,它通过产生100个线程来实现完全相同的操作...但是我想尝试使用node,因为有问题的两个操作...对于节点应用程序来说似乎是完美的情况,因为它们都可以在node中异步Java应用程序在哪里阻止这些调用.

We have an existing java application that does exactly same by spawning 100 threads ... But I want to try node since the two operations in question ... seem like perfect situation for node app since they both can be async in node where as Java app makes blocking calls for these.

异步呼叫和阻塞呼叫只是帮助您控制流量和工作量的工具.您的Java应用程序正在使用100个线程,因此一次可能具有100个线程.您的Node.js应用程序可能有可能一次执行1000项操作,但是某些操作将在JavaScript中的单个线程上完成,而其他IO工作将从线程池中拉出.无论如何,如果您要调用的后端系统一次只能处理20件事情,那么这都不重要.如果您的系统利用率为100%,那么改变工作方式肯定不会加快速度.

Asynchronous calls and blocking calls are just tools to help you control flow and workload. Your Java app is using 100 threads, and therefore has the potential of 100 things at a time. Your Node.js app may have the potential of doing 1,000 things at a time but some operations will be done in JavaScript on a single thread and other IO work will pull from a thread pool. In any case, none of this matters if the backend system you're calling can only handle 20 things at a time. If your system is 100% utilized, changing the way you do your work certainly won't speed it up.

简而言之,使某些事物异步不是提高速度的工具,而是管理工作量的工具.

In short, making something asynchronous is not a tool for speed, it is a tool for managing the workload.

其中说节点可以同时打开X个http连接/套接字的设置是什么?我可以更改它吗?

Where's the setting that says node can open X number of http connections / sockets concurrently ? and can I change it?

Node.js的HTTP客户端自动具有一个代理,允许您利用保持活动连接.这也意味着除非编写代码,否则不会泛洪单个主机.如文档中所述,http.globalAgent.maxSocket=1000是您想要的: http://nodejs.org/api /http.html#http_agent_maxsockets

Node.js' HTTP client automatically has an agent, allowing you to utilize keep-alive connections. It also means that you won't flood a single host unless you write code to do so. http.globalAgent.maxSocket=1000 is what you want, as mentioned in the documentation: http://nodejs.org/api/http.html#http_agent_maxsockets

我必须规范文件处理,因为文件读取比REST调用快得多,因此过一会儿打开的REST请求过多,它会阻塞系统并耗尽内存...所以现在我阅读了有1000行等待REST调用完成,然后再恢复(我正在使用流中的pause和resume方法进行此操作)是否有更好的替代方法?

I had to regulate the file processing because the file reading is much faster than the REST call so after a while there are too many open REST requests and it clogs the system and it goes out of memory... so now I read 1000 lines wait for the REST calls to finish for those and then resume it ( i am doing it using pause and resume methods on stream) Is there a better alternative to this?

请勿在流中使用.on('data')使用.on('readable') .准备好后,才从流中读取.我还建议使用转换流按行读取.

我可以执行所有可能的优化以使其比此更快.我知道与gc相关的配置,可以防止应用程序频繁停止运行.

What all possible optimizations can I perform so that it becomes faster than this. I know the gc related config that prevents from frequent halts in the app.

如果不对代码进行详细分析,这是不可能回答的.阅读有关Node.js及其内部原理的更多信息.如果您花一些时间在此上,那么适合您的优化将变得清晰起来.

This is impossible to answer without detailed analysis of your code. Read more about Node.js and how its internals work. If you spend some time on this, the optimizations that are right for you will become clear.

是否建议使用集群"模块?它可以无缝运行吗?

Is using "cluster" module recommended? Does it work seamlessly?

仅当您无法充分利用硬件时才需要.不清楚您所说的无缝"是什么意思,但是就操作系统而言,每个进程都是其自己的进程,因此我不称其为无缝".

This is only needed if you are unable to fully utilize your hardware. It isn't clear what you mean by "seamlessly", but each process is its own process as far as the OS is concerned, so it isn't something I would call "seamless".

这篇关于涉及HTTP调用的Node.js性能优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆