TCP、HTTP 和多线程的甜蜜点 [英] TCP, HTTP and the Multi-Threading Sweet Spot

查看:19
本文介绍了TCP、HTTP 和多线程的甜蜜点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解我获得的性能数据以及如何确定最佳线程数.

我的结果见这篇文章的底部

我用 perl 编写了一个实验性的多线程 Web 客户端,它下载一个页面,抓取每个图像标签的源代码并下载图像 - 丢弃数据.

I wrote an experimental multi-threaded web client in perl which downloads a page, grabs the source for each image tag and downloads the image - discarding the data.

它使用非阻塞连接,每个文件的初始超时为 10 秒,每次超时后都会加倍并重试.它还缓存 IP 地址,因此每个线程只需进行一次 DNS 查找.

It uses a non-blocking connect with an initial per file timeout of 10 seconds which doubles after each timeout and retry. It also caches IP addresses so each thread only has to do a DNS lookup once.

http://hubblesite.org/gallery/album/entire/npp/all/hires/true/.缩略图由一家声称专注于为高带宽应用程序提供低延迟的公司托管.

The total amount of data downloaded is 2271122 bytes in 1316 files via 2.5Mbit connection from http://hubblesite.org/gallery/album/entire/npp/all/hires/true/ . The thumbnail images are hosted by a company which claims to specialize in low latency for high bandwidth applications.

挂墙时间是:

1 个线程需要 4:48 -- 0 次超时
2 个线程需要 2:38 -- 0 次超时
5 个线程需要 2:22 -- 20 次超时
10 个线程需要 2:27 -- 40 次超时
50 个线程需要 2:27 -- 170 次超时

1 Thread takes 4:48 -- 0 timeouts
2 Threads takes 2:38 -- 0 timeouts
5 Threads takes 2:22 -- 20 timeouts
10 Threads take 2:27 -- 40 timeouts
50 Threads take 2:27 -- 170 timeouts

在最坏的情况下(50 个线程),客户端消耗的 CPU 时间不到 2 秒.

In the worst case ( 50 threads ) less than 2 seconds of CPU time are consumed by the client.

平均文件大小 1.7k
平均 rtt 100 毫秒(通过 ping 测量)
平均 cli cpu/img 1 毫秒

avg file size 1.7k
avg rtt 100 ms ( as measured by ping )
avg cli cpu/img 1 ms

最快的平均下载速度是 5 个线程,总体约为 15 KB/秒.

The fastest average download speed is 5 threads at about 15 KB / sec overall.

服务器实际上似乎确实具有相当低的延迟,因为获取每个图像只需要 218 毫秒,这意味着服务器处理每个请求平均只需要 18 毫秒:

The server actually does seem to have pretty low latency as it takes only 218 ms to get each image meaning it takes only 18 ms on average for the server to process each request:

0 cli 发送同步
50 srv rcvs 同步
50 srv 发送 syn + ack
100 cli conn 建立/cli 发送 get
150 srv recv 的获取
168 srv读取文件,发送数据,调用close
218 cli recv HTTP 标头 + 2 段中的完整文件 MSS == 1448

0 cli sends syn
50 srv rcvs syn
50 srv sends syn + ack
100 cli conn established / cli sends get
150 srv recv's get
168 srv reads file, sends data , calls close
218 cli recv HTTP headers + complete file in 2 segments MSS == 1448

我可以看到每个文件的平均下载速度很低,因为文件很小,而且每个文件的连接设置成本相对较高.

I can see that the per file average download speed is low because of the small file sizes and the relatively high cost per file of the connection setup.

我不明白的是为什么我看到超过 2 个线程的性能几乎没有提高.服务器似乎足够快,但已经开始以 5 个线程超时连接.

What I don't understand is why I see virtually no improvement in performance beyond 2 threads. The server seems to be sufficiently fast, but already starts timing out connections at 5 threads.

超时似乎在大约 900 - 1000 个成功连接后开始,无论是 5 还是 50 个线程,我认为这可能是服务器上的某种限制阈值,但我预计 10 个线程仍然比 2 个线程快得多.

The timeouts seem to start after about 900 - 1000 successful connections whether it's 5 or 50 threads, which I assume is probably some kind of throttling threshold on the server, but I would expect 10 threads to still be significantly faster than 2.

我在这里遗漏了什么吗?

Am I missing something here?

EDIT-1

为了比较,我安装了 DownThemAll Firefox 扩展并使用它下载了图像.我将它设置为 4 个同时连接,超时 10 秒.DTM 花了大约 3 分钟下载所有文件并将它们写入磁盘,并且在大约 900 个连接后它也开始出现超时.

Just for comparisons sake I installed the DownThemAll Firefox extension and downloaded the images using it. I set it to 4 simultaneous connections with a 10 second timeout. DTM took about 3 minutes to download all the files + write them to disk, and it also started experiencing timeouts after about 900 connections.

我将运行 tcpdump 以尝试更好地了解 tcp 协议级别的情况.

I'm going to run tcpdump to try and get a better picture what's going on at the tcp protocol level.

我还清除了 Firefox 的缓存并点击了重新加载.40 秒重新加载页面和所有图像.这似乎太快了 - 也许 Firefox 将它们保存在未清除的内存缓存中?所以我打开 Opera 也花了大约 40 秒.我认为它们的速度要快得多,因为它们必须使用 HTTP/1.1 流水线?

I also cleared Firefox's cache and hit reload. 40 Seconds to reload the page and all the images. That seemed way too fast - maybe Firefox kept them in a memory cache which wasn't cleared? So I opened Opera and it also took about 40 seconds. I assume they're so much faster because they must be using HTTP/1.1 pipelining?

答案是!??

因此,经过更多的测试和编写代码以通过流水线重用套接字后,我发现了一些有趣的信息.

So after a little more testing and writing code to reuse the sockets via pipelining I found out some interesting info.

当以 5 个线程运行时,非流水线版本在 77 秒内检索前 1026 个图像,但又需要 65 秒才能检索剩余的 290 个图像.这几乎证实了 MattH's 关于我的客户端被 SYN FLOOD 事件击中导致服务器的理论在短时间内停止响应我的连接尝试.然而,这只是问题的一部分,因为 77 秒对于 5 个线程获取 1026 个图像来说仍然很慢;如果您删除 SYN FLOOD 问题,检索所有文件仍需要大约 99 秒.因此,基于一些研究和一些 tcpdump 的问题,问题的另一部分似乎是延迟和连接设置开销.

When running at 5 threads the non-pipelined version retrieves the first 1026 images in 77 seconds but takes a further 65 seconds to retrieve the remaining 290 images. This pretty much confirms MattH's theory about my client getting hit by a SYN FLOOD event causing the server to stop responding to my connection attempts for a short period of time. However, that is only part of the problem since 77 seconds is still very slow for 5 threads to get 1026 images; if you remove the SYN FLOOD issue it would still take about 99 seconds to retrieve all the files. So based on a little research and some tcpdump's it seems like the other part of the issue is latency and the connection setup overhead.

这里我们回到寻找最佳位置"或最佳线程数的问题.我修改了客户端来实现HTTP/1.1 Pipelining,发现这种情况下的最佳线程数在15到20之间.例如:

Here's where we get back to the issue of finding the "Sweet Spot" or the optimal number of threads. I modified the client to implement HTTP/1.1 Pipelining and found that the optimal number of threads in this case is between 15 and 20. For example:

1 个线程需要 2:37 -- 0 次超时
2 个线程需要 1:22 -- 0 次超时
5 个线程需要 0:34 -- 0 次超时
10 个线程占用 0:20 -- 0 次超时
11 个线程占用 0:19 -- 0 次超时
15 个线程占用 0:16 -- 0 次超时

1 Thread takes 2:37 -- 0 timeouts
2 Threads takes 1:22 -- 0 timeouts
5 Threads takes 0:34 -- 0 timeouts
10 Threads take 0:20 -- 0 timeouts
11 Threads take 0:19 -- 0 timeouts
15 Threads take 0:16 -- 0 timeouts

有四个因素影响这个;延迟/rtt ,最大端到端带宽,recv 缓冲区大小以及正在下载的图像文件的大小.请参阅此站点以获取讨论接收缓冲区大小和 RTT 延迟如何影响可用带宽.

There are four factors which affect this; latency / rtt , maximum end-to-end bandwidth, recv buffer size and the size of the image files being downloaded. See this site for a discussion on how receive buffer size and RTT latency affect available bandwidth.

除上述之外,平均文件大小影响每个连接的最大值传输率.每次发出 GET 请求时,都会在您的传输管道是连接 RTT 的大小.例如,如果您的最大可能传输速率( recv buff size/RTT )为 2.5Mbit 并且您的 RTT 是 100 毫秒,那么每个 GET 请求都会在您的管道.对于 320kB 的大平均图像大小,相当于 10% 的开销每个文件,有效地将您的可用带宽减少到 2.25Mbit.然而,对于 3.2kB 的小平均文件大小,开销跃升至 1000% 并且可用带宽减少到 232 kbit/秒 - 大约 29kB.

In addition to the above, average file size affects the maximum per connection transfer rate. Every time you issue a GET request you create an empty gap in your transfer pipe which is the size of the connection RTT. For example, if you're Maximum Possible Transfer Rate ( recv buff size / RTT ) is 2.5Mbit and your RTT is 100ms, then every GET request incurs a minimum 32kB gap in your pipe. For a large average image size of 320kB that amounts to a 10% overhead per file, effectively reducing your available bandwidth to 2.25Mbit. However, for a small average file size of 3.2kB the overhead jumps to 1000% and available bandwidth is reduced to 232 kbit / second - about 29kB.

因此要找到最佳线程数:

间隙大小 = MPTR * RTT
MPTR/(MPTR/间隙大小 + AVG 文件大小)* AVG 文件大小)

Gap Size = MPTR * RTT
MPTR / (MPTR / Gap Size + AVG file size) * AVG file size)

对于我上面的场景,这为我提供了 11 个线程的最佳线程数,这非常接近我的真实世界结果.

For my above scenario this gives me an optimum thread count of 11 threads, which is extremely close to my real world results.

如果实际连接速度比理论 MPTR 慢那么它应该在计算中使用.

If the actual connection speed is slower than the theoretical MPTR then it should be used in the calculation instead.

推荐答案

请纠正我这个总结不正确:

Please correct me this summary is incorrect:

  • 您的 多线程 客户端将启动一个连接到服务器的线程,并仅发出一个 HTTP GET 然后该线程关闭.
  • 当你说 1, 2, 5, 10, 50 个线程时,你只是指你允许有多少并发线程,每个线程本身只处理一个请求
  • 您的客户需要 2 到 5 分钟才能下载超过 1000 张图片
  • Firefox 和 Opera 将在 40 秒内下载等效的数据集
  • Your multi-threaded client will start a thread that connects to the server and issues just one HTTP GET then that thread closes.
  • When you say 1, 2, 5, 10, 50 threads, you're just referring to how many concurrent threads you allow, each thread itself only handles one request
  • Your client takes between 2 and 5 minutes to download over 1000 images
  • Firefox and Opera will download an equivalent data set in 40 seconds

我建议服务器对 http 连接进行速率限制,通过网络服务器守护程序本身、服务器本地防火墙或最有可能的专用防火墙.

I suggest that the server rate-limits http connections, either by the webserver daemon itself, a server-local firewall or most likely dedicated firewall.

您实际上是在滥用网络服务,因为您没有为多个请求重复使用 HTTP 连接,并且您遇到的超时是因为您的 SYN FLOOD 被限制了.

You are actually abusing the webservice by not re-using the HTTP Connections for more than one request and that the timeouts you experience are because your SYN FLOOD is being clamped.

Firefox 和 Opera 可能使用 4 到 8 个连接来下载所有文件.

Firefox and Opera are probably using between 4 and 8 connections to download all of the files.

如果您重新设计代码以重用连接,您应该获得类似的性能.

If you redesign your code to re-use the connections you should achieve similar performance.

这篇关于TCP、HTTP 和多线程的甜蜜点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆