并发 100 的 NodeJS 请求超时 [英] NodeJS request timeouts with concurrency 100

查看:52
本文介绍了并发 100 的 NodeJS 请求超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两台机器,一台服务器"和一台客户端".两者都是带有 NodeJS v5.8.0 的 CentOS6.

I have two machines, one "server" and one "client". Both are CentOS6 with NodeJS v5.8.0.

服务器运行以下程序:

const AppPort = 8080;
var app = require('express')();
var logger = require('log4js').getLogger();
var onFinished = require('on-finished');
var uid = require('uid');

var reqCnt = 0;
var reqFin = 0;


app.get('/', function(req, res) {
  onFinished(req, function() {
    reqFin++;
    var ts2 = (new Date()).getTime();
    logger.info(`uid=${req.uid}, dt=${ts2-req.ts1}`);
  });
  req.ts1 = (new Date()).getTime();
  req.uid = uid();

  reqCnt++;
  logger.info(`ReqCnt=${reqCnt}, fins=${reqFin}`);
  res.send("This is XML");
});
app.listen(AppPort);

只是为了返回"This is XML"字符串并计算完成请求的时间.

It's only purpose to return "This is XML" string and calculate time of fulfilling the request.

在客户端"机器上,我运行以下程序:

On the "client" machine I run the following program:

const AppPort = 10000;
var onFinished = require('on-finished');
var async = require('async');
var request = require('request');
var logger = require('log4js').getLogger();
var app = require('express')();
var fs = require('fs');
var util = require('util');

url = "http://my-server";

var errCnt = 0;
var okCnt = 0;
var active2 = 0;

setInterval(function() {
  var errFrac = Math.floor(errCnt/(okCnt+errCnt)*100);
  logger.info(`${okCnt},${errCnt},${active2},${errFrac}`);
}, 1000);

app.get('/test', function(req,res) {
  onFinished(res, function() {
    active2--;
  });
  active2++;
  var ts1 = (new Date()).getTime();
  request(url, {timeout: 1000}, function(err, response, body ) {
    var ts2 = (new Date()).getTime();
    var dt = ts2-ts1;
    if ( err ) {
      errCnt += 1;
      logger.error(`Error: ${err}, dt=${dt}, errCnt=${errCnt}`);
      res.send(`Error: ${err}`);
    }
    else {
      okCnt += 1;
      logger.info(`OK: ${url}`);
      res.send(`OK: ${body}`);
    }
  });
});
var http = app.listen(AppPort);

logger.info(`Listening on ${AppPort}, pid=${process.pid}`);

此客户端"代码在端口 10000 上自行侦听并向服务器"机器发出请求以获取这是 XML"字符串.该数据被传输回客户端"的客户端.

This "client" code listens by itself on port 10000 and makes request to "server" machine to get "This is XML" string. This data is transferred back to "client"'s client.

我使用 siege 对我的客户端代码进行负载测试:

I load-test my client code with siege:

siege -v -r 100 -c 100 http://my-client:10000/test 

我几乎立即开始收到 ETIMEOUT 错误:

Almost immediately I start to get ETIMEOUT errors:

[2016-03-15 18:17:05.155] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1028, errCnt=3
[2016-03-15 18:17:05.156] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1028, errCnt=4
[2016-03-15 18:17:05.156] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1027, errCnt=5
[2016-03-15 18:17:05.157] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1027, errCnt=6
[2016-03-15 18:17:05.157] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1027, errCnt=7
[2016-03-15 18:17:05.157] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1027, errCnt=8
[2016-03-15 18:17:05.158] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1027, errCnt=9
[2016-03-15 18:17:05.160] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1029, errCnt=10
[2016-03-15 18:17:05.160] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1028, errCnt=11
[2016-03-15 18:17:05.161] [ERROR] [default] - Error: Error: ETIMEDOUT, dt=1028, errCnt=12

此外,虽然不那么频繁,getaddrinfo 错误会出现:

Also, though much less frequently, getaddrinfo errors appear:

Error: Error: getaddrinfo ENOTFOUND {my-server-domain-here}:8080, dt=2, errCnt=4478

但是,对服务器"的所有请求都在服务器本身不到 3 毫秒(dt 值)内处理:

However, all requests to the "server" are processed within less then 3 milliseconds (dt values) on the server itself:

[2016-03-15 18:19:13.847] [INFO] [default] - uid=66ohx90, dt=1
[2016-03-15 18:19:13.862] [INFO] [default] - ReqCnt=5632, fins=5631
[2016-03-15 18:19:13.862] [INFO] [default] - uid=j8mpxdm, dt=0
[2016-03-15 18:19:13.865] [INFO] [default] - ReqCnt=5633, fins=5632
[2016-03-15 18:19:13.866] [INFO] [default] - uid=xcetqyj, dt=1
[2016-03-15 18:19:13.877] [INFO] [default] - ReqCnt=5634, fins=5633
[2016-03-15 18:19:13.877] [INFO] [default] - uid=i5qnbit, dt=0
[2016-03-15 18:19:13.895] [INFO] [default] - ReqCnt=5635, fins=5634
[2016-03-15 18:19:13.895] [INFO] [default] - uid=hpdmxpg, dt=1
[2016-03-15 18:19:13.930] [INFO] [default] - ReqCnt=5636, fins=5635
[2016-03-15 18:19:13.930] [INFO] [default] - uid=8g3t8md, dt=0
[2016-03-15 18:19:13.934] [INFO] [default] - ReqCnt=5637, fins=5636
[2016-03-15 18:19:13.934] [INFO] [default] - uid=8rwkad6, dt=0
[2016-03-15 18:19:14.163] [INFO] [default] - ReqCnt=5638, fins=5637
[2016-03-15 18:19:14.165] [INFO] [default] - uid=1sh2frd, dt=2
[2016-03-15 18:19:14.169] [INFO] [default] - ReqCnt=5639, fins=5638
[2016-03-15 18:19:14.170] [INFO] [default] - uid=comn76k, dt=1
[2016-03-15 18:19:14.174] [INFO] [default] - ReqCnt=5640, fins=5639
[2016-03-15 18:19:14.174] [INFO] [default] - uid=gj9e0fm, dt=0
[2016-03-15 18:19:14.693] [INFO] [default] - ReqCnt=5641, fins=5640
[2016-03-15 18:19:14.693] [INFO] [default] - uid=x0yw66n, dt=0
[2016-03-15 18:19:14.713] [INFO] [default] - ReqCnt=5642, fins=5641
[2016-03-15 18:19:14.714] [INFO] [default] - uid=e2cumjv, dt=1
[2016-03-15 18:19:14.734] [INFO] [default] - ReqCnt=5643, fins=5642
[2016-03-15 18:19:14.735] [INFO] [default] - uid=34e0ohl, dt=1
[2016-03-15 18:19:14.747] [INFO] [default] - ReqCnt=5644, fins=5643
[2016-03-15 18:19:14.749] [INFO] [default] - uid=34aau79, dt=2

所以,问题不是服务器"处理请求的时间太长,而是客户端有问题.

So, the problem is not that the "server" processes the requests too long, but there is a problem with the client.

在 NodeJS 5.8 中 globalAgent 如下所示:

In NodeJS 5.8 globalAgent looks like the following:

console.log(require('http.globalAgent'))

Agent {
  domain: null,
  _events: { free: [Function] },
  _eventsCount: 1,
  _maxListeners: undefined,
  defaultPort: 80,
  protocol: 'http:',
  options: { path: null },
  requests: {},
  sockets: {},
  freeSockets: {},
  keepAliveMsecs: 1000,
  keepAlive: false,
  maxSockets: Infinity,
  maxFreeSockets: 256 }

ulimits 在我的系统上看起来像:

ulimits on my system look like:

root@njs testreq]# ulimit -all
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128211
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 200000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 128211
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

超时的原因是什么?

推荐答案

最近运行一些负载测试时,我遇到了类似的错误,但是我看到了多个 EADDRINUSE 错误而不是 ETIMEDOUT 错误.当时我正在使用以下 HTTP 代理配置更改运行测试.

When running a few load tests recently I ran into a similar error, however instead of ETIMEDOUT errors I saw multiple EADDRINUSE errors. At the time I was running the tests with the following HTTP agent configuration changes.

{
    maxSockets: 256,
    keepAlive: false
}

事实证明,这种配置浪费了大量周期,故意在单个请求后关闭每个连接,而 EADDRINUSE 错误是由于临时端口用完了.

It turns out this configuration wastes a lot of cycles intentionally closing each connection after a single request, and the EADDRINUSE errors were due to running out of ephemeral ports.

对于我的测试,我仍然使用 0.12.9 版本,所以我不确定这是否仍然适用于 >= 4.x 的版本,但核心 HTTP 库将根据主机/端口自动维护与服务器的连接/可能时的协议.这可以大大减少客户端和服务器上的负载,但如果客户端池太小而无法处理出站请求的速率,也会导致请求增加.最好的配置是尽可能保持活动连接,但仍然有足够大的连接池来快速处理每个出站请求.

For my tests I was still using version 0.12.9 so I'm not sure if this still holds in versions >= 4.x, but the core HTTP library will automatically maintain connections to servers based on the host/port/protocol when possible. This can greatly reduce the load on the client and server, but can also cause requests to build up if the client pool is too small to handle the rate of outbound requests. The best configuration then is one that will keep alive connections whenever possible, but still has a large enough connection pool to quickly handle each outbound request.

此外,Node.js 建立在实现事件循环接口的 libuv 之上.不管怎样,几乎任何由核心 Node.js 库实现的异步操作都会与 libuv 交互.为了实现这种类型的接口,libuv 将使用几种不同的策略之一,其中之一是 线程池.此线程池的默认大小为 4,最大为 128.

Additionally, Node.js is built on top of libuv which implements the event loop interface. One way or another, almost any asynchronous operation implemented by a core Node.js library will interact with libuv. In order to implement this type of interface libuv will use one of several different policies, one of which is a thread pool. The default size of this thread pool is 4, with a max of 128.

这里的重点是对 getaddrinfogetnameinfo 的任何调用都将使用线程池,这意味着无论您的 HTTP 连接池、DNS 查询和网络堆栈中较低的一些操作将根据线程池大小进行序列化.可以通过将环境变量 UV_THREADPOOL_SIZE 设置为 4 - 128 范围内的值来更改线程池大小.

The important point here is that any calls to getaddrinfo and getnameinfo will use the thread pool, which means regardless of the size of your HTTP connection pool, DNS queries and some operations lower in the network stack will be serialized based on the thread pool size. It's possible to change the thread pool size by setting the environment variable UV_THREADPOOL_SIZE to a value in the range 4 - 128.

对于我的测试,理想的设置是 UV_THREADPOOL_SIZE=50 并具有以下 HTTP 代理配置.

For my tests the ideal settings were UV_THREADPOOL_SIZE=50 with the following HTTP agent configuration.

{
    maxSockets: 256,
    keepAlive: true
}

这个答案有更多关于何时以及如何使用 libuv 的信息.

This answer has more info on when and how libuv is used.

这篇关于并发 100 的 NodeJS 请求超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆