为什么在Nginx中增加worker_connections会使node.js集群中的应用程序变慢? [英] Why increasing worker_connections in Nginx makes the application slower in node.js cluster?

查看:98
本文介绍了为什么在Nginx中增加worker_connections会使node.js集群中的应用程序变慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将我的应用程序转换为node.js集群,我希望它可以提高我的应用程序的性能。

I'm transforming my application to node.js cluster which I hope it would boost the performance of my application.

目前,我正在将应用程序部署到2个EC2 t2.medium实例。我有Nginx作为代理和ELB。

Currently, I'm deploying the application to 2 EC2 t2.medium instances. I have Nginx as a proxy and ELB.

这是我的快速集群应用程序,它是文档中非常标准的。

This is my express cluster application which is pretty standard from the documentation.

var bodyParser = require('body-parser');
var cors = require('cors');
var cluster = require('cluster');
var debug = require('debug')('expressapp');

if(cluster.isMaster) {
  var numWorkers = require('os').cpus().length;
  debug('Master cluster setting up ' + numWorkers + ' workers');

  for(var i = 0; i < numWorkers; i++) {
    cluster.fork();
  }

  cluster.on('online', function(worker) {
    debug('Worker ' + worker.process.pid + ' is online');
  });

  cluster.on('exit', function(worker, code, signal) {
    debug('Worker ' + worker.process.pid + ' died with code: ' + code + ', and signal: ' + signal);
    debug('Starting a new worker');
    cluster.fork();  
  });
} else {
  // Express stuff
}

这个是我的Nginx配置。

This is my Nginx configuration.

nginx::worker_processes: "%{::processorcount}"
nginx::worker_connections: '1024'
nginx::keepalive_timeout: '65'

我在Nginx服务器上有2个CPU。

I have 2 CPUs on Nginx server.

这是我之前的表现。

我得到1,500个请求/秒非常好。现在我想我会增加Nginx上的连接数,这样我就可以接受更多的请求了。我这样做。

I get 1,500 request/s which is pretty good. Now I thought I would increase the number of connections on Nginx so I can accept more requests. I do this.

nginx::worker_processes: "%{::processorcount}"
nginx::worker_connections: '2048'
nginx::keepalive_timeout: '65'

这是我的表现后的表现。

And this is my after performance.

我认为这比以前更糟糕。

Which I think it's worse than before.

我使用gatling进行性能测试,这是代码。

I use gatling for performance testing and here's the code.

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class LoadTestSparrowCapture extends Simulation {
  val httpConf = http
    .baseURL("http://ELB")
    .acceptHeader("application/json")
    .doNotTrackHeader("1")
    .acceptLanguageHeader("en-US,en;q=0.5")
    .acceptEncodingHeader("gzip, defalt")
    .userAgentHeader("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0")

    val headers_10 = Map("Content-Type" -> "application/json")

    val scn = scenario("Load Test")
      .exec(http("request_1")
        .get("/track"))

    setUp(
      scn.inject(
        atOnceUsers(15000)
      ).protocols(httpConf))
}

我将此部署到我的gatling集群。所以,我有3个EC2实例在30秒内向我的应用程序发出15,000个请求。

I deployed this to my gatling cluster. So, I have 3 EC2 instances firing 15,000 requests in 30s to my application.

问题是,我可以做些什么来提高我的应用程序性能,或者我只需要添加更多机器?

The question is, is there anything I can do to increase my performance of my application or I just need to add more machines?

我正在测试的路线非常简单,我收到请求并将其发送到RabbitMQ,以便进一步处理。所以,那条路线的反应非常快。

The route that I'm testing is pretty simple, I get the request and send it off to RabbitMQ so it can be processed further. So, the response of that route is pretty fast.

推荐答案

您已经提到过您正在使用AWS并在ELB中的EC2实例前面。我看到你得到502和503状态代码。这些可以从ELB或您的EC2实例发送。确保在进行负载测试时,您知道错误来自何处。您可以在ELB的AWS控制台中查看此内容 CloudWatch指标

You've mentioned that you are using AWS and in the front of your EC2 instances in ELB. As I see you are getting 502 and 503 status codes. These can be sent from ELB or your EC2 instances. Make sure that when doing the load-test you know from where the errors are coming from. You can check this in AWS console in ELB CloudWatch metrics.

基本上 HTTPCode_ELB_5XX 表示您的ELB发送了50x。另一方面 HTTPCode_Backend_5XX 发送50x。您还可以在ELB的日志中验证。更好地解释ELB的错误你可以找到这里

Basically HTTPCode_ELB_5XX means your ELB sent 50x. On other hand HTTPCode_Backend_5XX sent 50x. You can also verify that in the logs of ELB. Better explanation of errors of ELB you can find here.

要在AWS上进行加载测试,您一定要阅读这个。重点是ELB只是另一组机器,如果负载增加需要扩展。默认缩放策略是(引用Ramping Up Testing一节):

To load-test on AWS you should definitely read this. Point is that ELB is just another set of machines, which needs to scale if your load increases. Default scaling strategy is (cited from the section "Ramping Up Testing"):


一旦你有一个测试工具,你将需要定义负载的增长。我们建议您每五分钟以不超过50%的速度增加负载。

Once you have a testing tool in place, you will need to define the growth in the load. We recommend that you increase the load at a rate of no more than 50 percent every five minutes.

这意味着当你开始时并发用户数,假设1000,默认情况下,您应该在5分钟内最多增加1500。这将保证ELB随着服务器的负载而扩展。确切的数字可能会有所不同,您必须自己测试它们。上次我测试了它的持续负载为1200 req./s w / o一个问题,然后我开始接收50x。您可以从单个客户端轻松地从X到Y用户运行加速场景并等待50x。

That means when you start at some number of concurrent users, lets say 1000, per default you should increase only up to 1500 within 5 minutes. This will guarantee that ELB will scale with load on your servers. Exact numbers may vary and you have to test them on your own. Last time I've tested it sustained load of 1200 req./s w/o an issue and then I've started to receive 50x. You can test it easily running ramp-up scenario from X to Y users from single client and waiting for 50x.

下一个非常重要的事情(来自DNS Resoultion部分)是:

Next very important thing (from part "DNS Resoultion") is:


如果客户端每分钟至少重新解析一次DNS,则客户端将不会使用Elastic Load Balancing添加到DNS的新资源。

If clients do not re-resolve the DNS at least once per minute, then the new resources Elastic Load Balancing adds to DNS will not be used by clients.

简而言之,这意味着您必须保证DNS中的TTL得到尊重,或者您的客户端通过进行DNS查找来重新解析并轮换他们收到的DNS IP,以保证循环方式以分配负载。如果不是(例如仅从一个客户端进行测试,而不是您的情况),您可以通过将所有流量仅定位到一个实例来重载一个ELB实例来扭曲结果。这意味着ELB根本不会扩展。

In short it means that you have to guarantee that TTL in DNS is respected, or that your clients re-resolve and rotate DNS IPs which they received by doing DNS lookup to guarantee round-robin fashion to distributing load. If not (e.g. testing from only one client, not your case) you can skew the results by overloading one instance of ELB by targeting all the traffic only to one instance. That means ELB will not scale at all.

希望它会有所帮助。

这篇关于为什么在Nginx中增加worker_connections会使node.js集群中的应用程序变慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆