TCP、HTTP 和多线程甜蜜点 [英] TCP, HTTP and the Multi-Threading Sweet Spot

查看：15 发布时间：2022/1/19 16:15:52 perl multithreading http tcp network-programming

本文介绍了TCP、HTTP 和多线程甜蜜点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试了解我获得的性能数据以及如何确定最佳线程数.

查看这篇文章的底部了解我的结果

我在 perl 中编写了一个实验性的多线程 Web 客户端，它下载一个页面，获取每个图像标签的源并下载图像 - 丢弃数据.

I wrote an experimental multi-threaded web client in perl which downloads a page, grabs the source for each image tag and downloads the image - discarding the data.

它使用非阻塞连接，每个文件的初始超时时间为 10 秒，每次超时后加倍并重试.它还缓存 IP 地址，因此每个线程只需进行一次 DNS 查找.

It uses a non-blocking connect with an initial per file timeout of 10 seconds which doubles after each timeout and retry. It also caches IP addresses so each thread only has to do a DNS lookup once.

从 http://hubblesite.org/gallery/album/entire/npp/all/hires/true/.缩略图由一家声称专注于高带宽应用程序的低延迟的公司托管.

The total amount of data downloaded is 2271122 bytes in 1316 files via 2.5Mbit connection from http://hubblesite.org/gallery/album/entire/npp/all/hires/true/ . The thumbnail images are hosted by a company which claims to specialize in low latency for high bandwidth applications.

墙上的时间是:

1 线程耗时 4:48 -- 0 次超时
2 个线程需要 2:38 -- 0 个超时
5 个线程需要 2:22 -- 20 次超时
10 个线程需要 2:27 -- 40 次超时
50 个线程需要 2:27 -- 170 个超时

1 Thread takes 4:48 -- 0 timeouts
2 Threads takes 2:38 -- 0 timeouts
5 Threads takes 2:22 -- 20 timeouts
10 Threads take 2:27 -- 40 timeouts
50 Threads take 2:27 -- 170 timeouts

在最坏的情况下(50 个线程)，客户端消耗的 CPU 时间不到 2 秒.

In the worst case ( 50 threads ) less than 2 seconds of CPU time are consumed by the client.

平均文件大小 1.7k
平均 rtt 100 毫秒(通过 ping 测量)
平均 cli cpu/img 1 毫秒

avg file size 1.7k
avg rtt 100 ms ( as measured by ping )
avg cli cpu/img 1 ms

最快的平均下载速度是 5 个线程，总体速度约为 15 KB/秒.

The fastest average download speed is 5 threads at about 15 KB / sec overall.

服务器实际上似乎有相当低的延迟，因为它只需要 218 毫秒来获取每个图像，这意味着服务器平均只需要 18 毫秒来处理每个请求:

The server actually does seem to have pretty low latency as it takes only 218 ms to get each image meaning it takes only 18 ms on average for the server to process each request:

0 cli 发送 syn
50 srv rcvs 同步
50 srv 发送 syn + ack
100 cli conn 已建立/cli 发送 get
150 srv recv 获得
168 srv 读取文件，发送数据，调用关闭
218 cli recv HTTP 标头 + 2 段中的完整文件 MSS == 1448

0 cli sends syn
50 srv rcvs syn
50 srv sends syn + ack
100 cli conn established / cli sends get
150 srv recv's get
168 srv reads file, sends data , calls close
218 cli recv HTTP headers + complete file in 2 segments MSS == 1448

我可以看到每个文件的平均下载速度较低，因为文件较小且连接设置的每个文件成本相对较高.

I can see that the per file average download speed is low because of the small file sizes and the relatively high cost per file of the connection setup.

我不明白的是，为什么我看不到超过 2 个线程的性能几乎没有任何改进.服务器似乎足够快，但在 5 个线程时已经开始超时连接.

What I don't understand is why I see virtually no improvement in performance beyond 2 threads. The server seems to be sufficiently fast, but already starts timing out connections at 5 threads.

超时似乎在大约 900 - 1000 个成功连接之后开始，无论是 5 个线程还是 50 个线程，我认为这可能是服务器上的某种限制阈值，但我希望 10 个线程仍然比 2 个线程快得多.

The timeouts seem to start after about 900 - 1000 successful connections whether it's 5 or 50 threads, which I assume is probably some kind of throttling threshold on the server, but I would expect 10 threads to still be significantly faster than 2.

我在这里遗漏了什么吗?

Am I missing something here?

EDIT-1

为了比较，我安装了 DownThemAll Firefox 扩展并使用它下载了图像.我将其设置为 4 个同时连接，超时时间为 10 秒.DTM 花了大约 3 分钟来下载所有文件 + 将它们写入磁盘，并且在大约 900 次连接后它也开始出现超时.

Just for comparisons sake I installed the DownThemAll Firefox extension and downloaded the images using it. I set it to 4 simultaneous connections with a 10 second timeout. DTM took about 3 minutes to download all the files + write them to disk, and it also started experiencing timeouts after about 900 connections.

我将运行 tcpdump 来尝试更好地了解 tcp 协议级别的情况.

I'm going to run tcpdump to try and get a better picture what's going on at the tcp protocol level.

我还清除了 Firefox 的缓存并点击重新加载.40 秒重新加载页面和所有图像.这似乎太快了——也许 Firefox 将它们保存在未清除的内存缓存中?所以我打开 Opera 也花了大约 40 秒.我认为它们的速度要快得多，因为它们必须使用 HTTP/1.1 管道?

I also cleared Firefox's cache and hit reload. 40 Seconds to reload the page and all the images. That seemed way too fast - maybe Firefox kept them in a memory cache which wasn't cleared? So I opened Opera and it also took about 40 seconds. I assume they're so much faster because they must be using HTTP/1.1 pipelining?

答案是！??

所以经过更多的测试和编写代码以通过流水线重用套接字后，我发现了一些有趣的信息.

So after a little more testing and writing code to reuse the sockets via pipelining I found out some interesting info.

以 5 个线程运行时，非流水线版本在 77 秒内检索前 1026 张图像，但需要 65 秒才能检索剩余的 290 张图像.这几乎证实了 MattH 的理论，即我的客户端被导致服务器的 SYN FLOOD 事件击中在短时间内停止响应我的连接尝试.然而，这只是问题的一部分，因为 77 秒对于 5 个线程获取 1026 张图像来说仍然非常慢；如果您删除 SYN FLOOD 问题，仍需要大约 99 秒来检索所有文件.因此，根据一些研究和一些 tcpdump，似乎问题的另一部分是延迟和连接设置开销.

When running at 5 threads the non-pipelined version retrieves the first 1026 images in 77 seconds but takes a further 65 seconds to retrieve the remaining 290 images. This pretty much confirms MattH's theory about my client getting hit by a SYN FLOOD event causing the server to stop responding to my connection attempts for a short period of time. However, that is only part of the problem since 77 seconds is still very slow for 5 threads to get 1026 images; if you remove the SYN FLOOD issue it would still take about 99 seconds to retrieve all the files. So based on a little research and some tcpdump's it seems like the other part of the issue is latency and the connection setup overhead.

这里我们回到寻找Sweet Spot"或最佳线程数的问题上.我修改客户端实现HTTP/1.1 Pipelining，发现这种情况下的最优线程数在15到20之间.例如:

Here's where we get back to the issue of finding the "Sweet Spot" or the optimal number of threads. I modified the client to implement HTTP/1.1 Pipelining and found that the optimal number of threads in this case is between 15 and 20. For example:

1 线程耗时 2:37 -- 0 次超时
2 个线程需要 1:22 -- 0 个超时
5 个线程需要 0:34 -- 0 个超时
10 个线程占用 0:20 -- 0 个超时
11 个线程占用 0:19 -- 0 个超时
15 个线程占用 0:16 -- 0 个超时

1 Thread takes 2:37 -- 0 timeouts
2 Threads takes 1:22 -- 0 timeouts
5 Threads takes 0:34 -- 0 timeouts
10 Threads take 0:20 -- 0 timeouts
11 Threads take 0:19 -- 0 timeouts
15 Threads take 0:16 -- 0 timeouts

有四个因素影响这个；延迟/rtt ，最大端到端带宽，recv 缓冲区大小以及正在下载的图像文件的大小.查看这个网站关于接收缓冲区大小和 RTT 延迟如何影响可用的讨论带宽.

There are four factors which affect this; latency / rtt , maximum end-to-end bandwidth, recv buffer size and the size of the image files being downloaded. See this site for a discussion on how receive buffer size and RTT latency affect available bandwidth.

除上述之外，平均文件大小会影响每个连接的最大值传输率.每次发出 GET 请求时，都会在您的传输管道是连接 RTT 的大小.例如，如果你的最大可能传输速率(recv buff size/RTT)是 2.5Mbit 和你的 RTT 是 100 毫秒，那么每个 GET 请求都会在你的管道.对于 320kB 的大型平均图像大小，这相当于 10% 的开销每个文件，有效地将您的可用带宽减少到 2.25Mbit.然而，对于 3.2kB 的小平均文件大小，开销会跃升至 1000%，并且可用带宽减少到 232 kbit/秒 - 大约 29kB.

In addition to the above, average file size affects the maximum per connection transfer rate. Every time you issue a GET request you create an empty gap in your transfer pipe which is the size of the connection RTT. For example, if you're Maximum Possible Transfer Rate ( recv buff size / RTT ) is 2.5Mbit and your RTT is 100ms, then every GET request incurs a minimum 32kB gap in your pipe. For a large average image size of 320kB that amounts to a 10% overhead per file, effectively reducing your available bandwidth to 2.25Mbit. However, for a small average file size of 3.2kB the overhead jumps to 1000% and available bandwidth is reduced to 232 kbit / second - about 29kB.

所以要找到最佳线程数:

间隙大小 = MPTR * RTT
MPTR/(MPTR/间隙大小 + AVG 文件大小) * AVG 文件大小)

Gap Size = MPTR * RTT
MPTR / (MPTR / Gap Size + AVG file size) * AVG file size)

对于我的上述情况，这为我提供了 11 个线程的最佳线程数，这与我的实际结果非常接近.

For my above scenario this gives me an optimum thread count of 11 threads, which is extremely close to my real world results.

如果实际连接速度比理论 MPTR 慢，那么它应该在计算中使用.

If the actual connection speed is slower than the theoretical MPTR then it should be used in the calculation instead.

TCP、HTTP 和多线程甜蜜点 [英] TCP, HTTP and the Multi-Threading Sweet Spot

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

TCP、HTTP 和多线程甜蜜点 [英] TCP, HTTP and the Multi-Threading Sweet Spot

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭