Java池连接优化 [英] Java pooling connection optimization

查看:115
本文介绍了Java池连接优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java中,哪些公共指南/建议配置一个http连接池,以支持对同一服务器的大量并发http调用?我的意思是:

Which are the commons guidelines/advices to configure, in Java, a http connection pool to support huge number of concurrent http calls to the same server? I mean:


  • 最大总连接数

  • 每条路线的最大默认连接数

  • 重用策略

  • 保持活跃策略

  • 保持活跃持续时间

  • 连接超时

  • ....

  • max total connections
  • max default connection per route
  • reuse strategy
  • keep alive strategy
  • keep alive duration
  • connection timeout
  • ....

(我正在使用Apache http组件4.3,但我可以使用探索新的解决方案)

(I am using Apache http components 4.3, but I am available to explore new solutions)

为了更清楚,这是我的情况:

In order to be more clear, this is my situation:

我开发了一个REST需要对AWS CloudSearch执行大约10次http调用以获取要在最终结果中收集的搜索结果的资源(我实际上无法通过单个查询获取)。
整个操作必须少于0.25秒。所以,我在10个不同的线程中并行运行http调用。
在benchamarking测试期间,我注意到很少有并发请求,5,我的目标已达到。但是,将并发请求增加到30,由于连接时间大约需要1秒,因此性能会大幅下降。相反,几乎没有并发请求,连接时间大约为150毫秒(更准确地说,第一个连接需要1秒,所有以下连接大约需要150毫秒)。我可以确保CloudSearch在不到15毫秒内返回响应,因此我的连接池中存在问题。

I developed a REST resource that needs to perform about 10 http calls to AWS CloudSearch in order to obtain search results to be collected in a final result (that I really cannot obtain through a single query). The whole operation must take less than 0.25 seconds. So, I run http calls in parallel in 10 different threads. During a benchamarking test, I noticed that with few concurrent request, 5, my objective is reached. But, increasing concurrent requests to 30, there is a tremendous degradation of performance due to the connection time that takes about 1 second. With few concurrent requests, instead, the connection time is about 150 ms (to be more precise, the first connection takes 1 second, all the following connections take about 150 ms). I can ensure that CloudSearch returns its response in less than 15 ms, so there is a problem somewhere in my connection pool.

谢谢!

推荐答案

最适合您实现的线程/连接数取决于该实现(您没有发布),但这里有一些指令:

The amount of threads/connections that are best for your implementation depend on that implementation (which you did not post), but here are some guidelines as requested:


  • 如果这些线程根本不会阻塞,那么你应该拥有与核心一样多的线程(Runtime.availableCores(),这将是包括超线程核心)。仅仅因为CPU使用率不可能超过100%。

  • If those threads never block at all, you should have as many threads as cores (Runtime.availableCores(), this will include hyperthread-cores). Simply because more than 100% CPU usage isn't possible.

如果您的线程很少阻塞,则核心* 2是基准测试的良好开端。

If your threads rarely block, cores * 2 is a good start for benchmarking.

如果您的线程经常阻塞,您绝对需要使用各种设置对应用程序进行基准测试,以便为您的实现,操作系统和硬件找到最佳解决方案。

If your threads frequently block, you absolutely need to benchmark your application with various settings to find the best solution for your implementation, OS and hardware.

现在最优的情况显然是第一个,但要达到这个,你需要删除代码中的阻塞您可以。如果在非阻塞模式下使用NIO包(这不是Apache包的方式),Java可以为IO操作执行此操作。

Now the most optimal case is obviously the first one, but to get to this one, you need to remove blocking from your code as much as you can. Java can do this for IO operations if you use the NIO package in non-blocking mode (which is not how the Apache package does it).

然后你有一个线程在选择器上等待,并在任何数据准备好发送或读取后立即唤醒。然后,该线程仅将数据从其源复制到目标并返回到选择器。在读取(传入数据)的情况下,此目标是阻塞队列,核心线程数等待。其中一个线程将提取所接收的数据并对其进行处理,现在没有任何阻塞。

Then you have 1 thread that waits on a selector and awakes as soon as any data is ready to be sent or read. This thread then only copies the data from it's source to the destination and returns to the selector. In case of a read (incoming data), this destination is a blocking queue, on which core amount of threads wait. One of those threads will then pull out the received data and process it, now without any blocking.

然后,您可以使用阻塞队列的长度来调整并行数量请求对于您的任务和硬件是合理的。

You can then use the length of the blocking queue to adjust how many parallel requests are reasonable for your task and hardware.



第一次连接需要> 1秒,因为它实际上必须通过查找地址DNS。所有其他连接暂时搁置,因为这样做两次没有任何意义。您可以通过调用IP(如果与负载均衡器通信可能不好)或通过初始请求预热连接来规避这一点。之后的任何新连接都将使用缓存的DNS结果,但仍需要执行其他初始化,因此尽可能多地重用连接将大大减少延迟。使用NIO这是一项非常简单的任务。


The first connection takes >1 second, because it actually has to look-up the address via DNS. All other connections are put on hold for the moment, as there is no sense in doing this twice. You can circumvent that by either calling the IP (probably not good if you talk to a load-balancer) or by "warming-up" the connections with an initial request. Any new connection afterwards will use the cached DNS result, but still needs to perform other initializations, so reusing connections as much as you can will reduce latency a lot. With NIO this is a very easy task.

此外还有 HTTP多请求,即:您建立一个连接,但在一个请求中请求多个URL,并通过同一行获得多个响应。这大大减少了连接开销,但需要服务器支持。

In addition there are HTTP-multi-requests, that is: you make one connection but request several URLs in one request and get several responses over "the same line". This massively reduces connection overhead, but needs to be supported by the server.

这篇关于Java池连接优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆