使用“待发送"限制 TCP 发送队列和其他设计问题 [英] Limiting TCP sends with a "to-be-sent" queue and other design issues

查看:19
本文介绍了使用“待发送"限制 TCP 发送队列和其他设计问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是我在过去几天里问的另外两个问题的结果.
我正在创建一个新问题,因为我认为它与我对如何控制发送/接收流程的理解中的下一步"有关,我还没有得到完整的答案.
其他相关问题是:
一个 IOCP 文档解释问题 - 缓冲区所有权歧义
非阻塞 TCP 缓冲区问题

This question is the result of two other questions I've asked in the last few days.
I'm creating a new question because I think it's related to the "next step" in my understanding of how to control the flow of my send/receive, something I didn't get a full answer to yet.
The other related questions are:
An IOCP documentation interpretation question - buffer ownership ambiguity
Non-blocking TCP buffer issues

总而言之,我使用的是 Windows I/O 完成端口.
我有几个线程处理来自完成端口的通知.
我相信这个问题是独立于平台的,就像在 *nix、*BSD、Solaris 系统上做同样的事情一样.

In summary, I'm using Windows I/O Completion Ports.
I have several threads that process notifications from the completion port.
I believe the question is platform-independent and would have the same answer as if to do the same thing on a *nix, *BSD, Solaris system.

所以,我需要有自己的流量控制系统.很好.
所以我发送发送和发送,很多.我怎么知道什么时候开始排队发送,因为接收方被限制为 X 数量?

So, I need to have my own flow control system. Fine.
So I send send and send, a lot. How do I know when to start queueing the sends, as the receiver side is limited to X amount?

让我们举一个例子(最接近我的问题):FTP 协议.
我有两台服务器;一个在 100Mb 链接上,另一个在 10Mb 链接上.
我命令 100Mb 的一个发送给另一个(10Mb 链接的)一个 1GB 的文件.它以 1.25MB/s 的平均传输速率结束.
发件人(100Mb 链接的)如何知道何时保持发送,以便较慢的不会被淹没?(在这种情况下,待发送"队列是硬盘上的实际文件).

Let's take an example (closest thing to my question): FTP protocol.
I have two servers; One is on a 100Mb link and the other is on a 10Mb link.
I order the 100Mb one to send to the other one (the 10Mb linked one) a 1GB file. It finishes with an average transfer rate of 1.25MB/s.
How did the sender (the 100Mb linked one) knew when to hold the sending, so the slower one wouldn't be flooded? (In this case the "to-be-sent" queue is the actual file on the hard-disk).

另一种提问方式:
我可以从远程端收到hold-your-sendings"通知吗?是TCP内置还是所谓的可靠网络协议"需要我这样做?

Another way to ask this:
Can I get a "hold-your-sendings" notification from the remote side? Is it built-in in TCP or the so called "reliable network protocol" needs me to do so?

我当然可以将我的发送限制为固定数量的字节,但这对我来说根本不合适.

I could of course limit my sendings to a fixed number of bytes but that simply doesn't sound right to me.

同样,我有一个循环向远程服务器发送许多数据,并且在某个时间点,在该循环中,我必须确定是否应该将该发送排队,或者我可以将其传递给传输层(TCP).
我怎么做?你会怎么做?当然,当我从 IOCP 收到发送已完成的完成通知时,我会发出其他待处理的发送,这很清楚.

Again, I have a loop with many sends to a remote server, and at some point, within that loop I'll have to determine if I should queue that send or I can pass it on to the transport layer (TCP).
How do I do that? What would you do? Of course that when I get a completion notification from IOCP that the send was done I'll issue other pending sends, that's clear.

另一个与此相关的设计问题:
由于我要使用带有发送队列的自定义缓冲区,并且当发送完成"通知到达时,这些缓冲区被释放以供重用(因此不使用删除"关键字),我将不得不使用该缓冲池的互斥.
使用互斥量会减慢速度,所以我一直在想;为什么不让每个线程都有自己的缓冲区池,这样访问它,至少在获取发送操作所需的缓冲区时,不需要互斥锁,因为它只属于那个线程.
缓冲池位于线程本地存储 (TLS) 级别.
没有相互池意味着不需要锁,意味着更快的操作但也意味着应用程序使用更多的内存,因为即使一个线程已经分配了 1000 个缓冲区,另一个正在发送并且需要 1000 个缓冲区来发送某些内容的线程也需要分配这些都是自己的.

Another design question related to this:
Since I am to use a custom buffers with a send queue, and these buffers are being freed to be reused (thus not using the "delete" keyword) when a "send-done" notification has been arrived, I'll have to use a mutual exlusion on that buffer pool.
Using a mutex slows things down, so I've been thinking; Why not have each thread have its own buffers pool, thus accessing it , at least when getting the required buffers for a send operation, will require no mutex, because it belongs to that thread only.
The buffers pool is located at the thread local storage (TLS) level.
No mutual pool implies no lock needed, implies faster operations BUT also implies more memory used by the app, because even if one thread already allocated 1000 buffers, the other one that is sending right now and need 1000 buffers to send something will need to allocated these to its own.

另一个问题:
假设我在待发送"队列中有缓冲区 A、B、C.
然后我收到一个完成通知,告诉我接收者得到了 15 个字节中的 10 个.我应该从缓冲区的相对偏移量重新发送,还是 TCP 会为我处理它,即完成发送?如果我应该这样做,我可以确定这个缓冲区是队列中的下一个要发送的"缓冲区,还是可以是缓冲区 B?

Another issue:
Say I have buffers A, B, C in the "to-be-sent" queue.
Then I get a completion notification that tells me that the receiver got 10 out of 15 bytes. Should I re-send from the relative offset of the buffer, or will TCP handle it for me, i.e complete the sending? And if I should, can I be assured that this buffer is the "next-to-be-sent" one in the queue or could it be buffer B for example?

这是一个很长的问题,我希望没有人受到伤害(:

This is a long question and I hope none got hurt (:

我很想看到有人花时间在这里回答.我保证我会为他投两票!(:
谢谢大家!

I'd loveeee to see someone takes the time to answer here. I promise I'll double-vote for him! (:
Thank you all!

推荐答案

首先:我会将这个问题作为单独的问题提出.您更有可能以这种方式得到答案.

Firstly: I'd ask this as separate questions. You're more likely to get answers that way.

我已经在我的博客上讨论了大部分内容:http://www.lenholgate.com但既然你已经给我发电子邮件说你读过我的博客,你就知道......

I've spoken about most of this on my blog: http://www.lenholgate.com but then since you've already emailed me to say that you read my blog you know that...

TCP 流控制问题是这样的,因为您正在发布异步写入并且这些都使用资源直到它们完成(参见 此处).在写入挂起期间,需要注意各种资源使用问题,其中数据缓冲区的使用是最不重要的;您还将用完一些非分页池,它是一种有限资源(尽管在 Vista 和以后的操作系统中有更多可用资源),您还将在写入期间将页面锁定在内存中,并且有操作系统可以锁定的页面总数的限制.请注意,非分页池使用和页面锁定问题在任何地方都没有得到很好的记录,但是一旦遇到 ENOBUFS,您就会开始看到写入失败.

The TCP flow control issue is such that since you are posting asynchronous writes and these each use resources until they complete (see here). During the time that the write is pending there are various resource usage issues to be aware of and the use of your data buffer is the least important of them; you'll also use up some non-paged pool which is a finite resource (though there is much more available in Vista and later than previous operating systems), you'll also be locking pages in memory for the duration of the write and there's a limit to the total number of pages that the OS can lock. Note that both the non-paged pool usage and page locking issues aren't something that's documented very well anywhere, but you'll start seeing writes fail with ENOBUFS once you hit them.

由于这些问题,不可控数量的待处理写入是不明智的.如果您正在发送大量数据并且您没有应用程序级别的流量控制,那么您需要注意,如果您发送数据的速度比连接另一端的处理速度快,或者比链接速度快,那么您将开始使用大量上述资源,因为由于 TCP 流量控制和窗口问题,您的写入需要更长的时间才能完成.阻塞套接字代码不会遇到这些问题,因为当 TCP 堆栈由于流量控制问题而无法再写入时,写调用只会阻塞;异步写入写入完成,然后挂起.使用阻塞代码,阻塞会为你处理流量控制;使用异步写入,您可以继续循环,越来越多的数据都在等待 TCP 堆栈发送...

Due to these issues it's not wise to have an uncontrolled number of writes pending. If you are sending a large amount of data and you have a no application level flow control then you need to be aware that if you send data faster than it can be processed by the other end of the connection, or faster than the link speed, then you will begin to use up lots and lots of the above resources as your writes take longer to complete due to TCP flow control and windowing issues. You don't get these problems with blocking socket code as the write calls simply block when the TCP stack can't write any more due to flow control issues; with async writes the writes complete and are then pending. With blocking code the blocking deals with your flow control for you; with async writes you could continue to loop and more and more data which is all just waiting to be sent by the TCP stack...

无论如何,因此,对于 Windows 上的异步 I/O,您应该始终具有某种形式的显式流控制.因此,您要么使用 ACK 将应用程序级别的流控制添加到您的协议中,以便您知道数据何时到达另一端,并且在任何时间只允许一定数量的未完成,或者如果您无法添加到应用程序级协议,你可以使用你的写完成来驱动事情.诀窍是允许每个连接有一定数量的未完成写入完成,并在达到限制后将数据排队(或不生成它).然后随着每次写入完成,您可以生成一个新的写入....

Anyway, because of this, with async I/O on Windows you should ALWAYS have some form of explicit flow control. So, you either add application level flow control to your protocol, using an ACK, perhaps, so that you know when the data has reached the other side and only allow a certain amount to be outstanding at any one time OR if you cant add to the application level protocol, you can drive things by using your write completions. The trick is to allow a certain number of outstanding write completions per connection and to queue the data (or just don't generate it) once you have reached your limit. Then as each write completes you can generate a new write....

您关于合并数据缓冲区的问题是,恕我直言,您现在过早优化.到达系统正常工作的地步,您已经对系统进行了概要分析,发现缓冲池上的争用是最重要的热点,然后解决它.我发现每个线程的缓冲池工作得不太好,因为跨线程的分配和释放往往不像你需要的那样平衡.我在我的博客上更多地谈到了这个:http://www.lenholgate.com/blog/2010/05/performance-comparisons-for-recent-code-changes.html

Your question about pooling the data buffers is, IMHO, premature optimisation on your part right now. Get to the point where your system is working properly and you have profiled your system and found that the contention on your buffer pool is the most important hot spot and THEN address it. I found that per thread buffer pools didn't work so well as the distribution of allocations and frees across threads tends not to be as balanced as you'd need to that to work. I've spoken about this more on my blog: http://www.lenholgate.com/blog/2010/05/performance-comparisons-for-recent-code-changes.html

您关于部分写入完成的问题(您发送 100 个字节,完成返回并说您只发送了 95 个)在实践中并不是一个真正的问题恕我直言.如果你到达这个位置并且有多个未完成的写入,那么你无能为力,后续的写入可能会正常工作,并且您将缺少预期发送的字节;但是 a) 我从未见过这种情况发生,除非您已经遇到了我上面详述的资源问题 b) 如果您已经在该连接上发布了更多写入,则无能为力,因此只需中止连接 - 请注意这是为什么我总是在它们将运行的硬件上分析我的网络系统,并且我倾向于在我的代码中设置限制以防止达到操作系统资源限制(Vista 之前的操作系统上的坏驱动程序经常蓝屏如果可以的话)t get non paged pool 所以如果你不注意这些细节,你可以把一个盒子放下).

Your question about partial write completions (you send 100 bytes and the completion comes back and says that you have only sent 95) isn't really a problem in practice IMHO. If you get to this position and have more than the one outstanding write then there's nothing you can do, the subsequent writes may well work and you'll have bytes missing from what you expected to send; BUT a) I've never seen this happen unless you have already hit the resource problems that I detail above and b) there's nothing you can do if you have already posted more writes on that connection so simply abort the connection - note that this is why I always profile my networking systems on the hardware that they will run on and I tend to place limits in MY code to prevent the OS resource limits ever being reached (bad drivers on pre Vista operating systems often blue screen the box if they can't get non paged pool so you can bring a box down if you don't pay careful attention to these details).

下次请单独提问.

这篇关于使用“待发送"限制 TCP 发送队列和其他设计问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆