发送使用管道数据到多个插座,三通()和剪接() [英] Send data to multiple sockets using pipes, tee() and splice()

查看:147
本文介绍了发送使用管道数据到多个插座,三通()和剪接()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用复制三通主管道()用剪接写入多个插座()。当然这些管道会以不同的速度取决于多少我可以拼接()到目标插槽得到清空。所以,当我下次再去数据添加到主管,然后三通()的话,我可能有一种情况,我可以写64KB到管道,但只有4KB三通的奴隶的管道之一。我猜,然后,如果我剪接()所有的大师管插座的,我将永远无法三通()剩余的60KB到该从属管道。真的吗?我想我可以跟踪一个tee_offset(从0开始),我设置为unteed数据开始的,然后不拼接()过去吧。因此,在这种情况下,我会成立tee_offset为4096,而不是拼接不止于此,直到我能够把它发球台到所有其他管道。我在这里在正确的轨道上?任何提示/我警告?

I'm duplicating a "master" pipe with tee() to write to multiple sockets using splice(). Naturally these pipes will get emptied at different rates depending on how much I can splice() to the destination sockets. So when I next go to add data to the "master" pipe and then tee() it again, I may have a situation where I can write 64KB to the pipe but only tee 4KB to one of the "slave" pipes. I'm guessing then that if I splice() all of the "master" pipe to the socket, I will never be able to tee() the remaining 60KB to that slave pipe. Is that true? I guess I can keep track of a tee_offset (starting at 0) which I set to the start of the "unteed" data and then don't splice() past it. So in this case I would set tee_offset to 4096 and not splice more than that until I'm able to tee it to all to the other pipes. Am I on the right track here? Any tips/warnings for me?

推荐答案

如果我理解正确的话,你得要复用到多个插槽数据的某些实时源。你有一个单一的源管道迷上了无论是生产数据,和你有超过你要发送的数据每个插座目的地管道。你在做什么用的就是三通()来从源头上管的数据复制到每一个目标管道和拼接()来它从目标管道复制到插座本身

If I understand correctly, you've got some realtime source of data that you want to multiplex to multiple sockets. You've got a single "source" pipe hooked up to whatever's producing your data, and you've got a "destination" pipe for each socket over which you wish to send the data. What you're doing is using tee() to copy data from the source pipe to each of the destination pipes and splice() to copy it from the destination pipes to the sockets themselves.

你要打到这里的根本问题是,如果插座中的一个根本无法跟上 - 如果你生成数据的速度比你可以给它,那么你将有一个问题。这是不与您使用的管道,它只是一个根本性的问题。所以,你要选择一个策略,在这种情况下,以应付 - 我建议这个处理,即使你不希望它是常见的,因为这些东西往往上来以后咬你。基本的选择是要么关闭有问题的插座,或跳过数据直到它被清除其输出缓冲器 - 后者选择可能更适合的音频/视频流,例如

The fundamental issue you're going to hit here is if one of the sockets simply can't keep up - if you're producing data faster than you can send it, then you're going to have a problem. This isn't related to your use of pipes, it's just a fundamental issue. So, you'll want to pick a strategy to cope in this case - I suggest handling this even if you don't expect it to be common as these things often come up to bite you later. Your basic choices are to either close the offending socket, or to skip data until it's cleared its output buffer - the latter choice might be more suitable for audio/video streaming, for example.

其中的的问题是的与您使用的管道,然而,就是在Linux上管道缓冲区的大小不太灵活。它默认为64K因为Linux 2.6.11(即三通() 2.6.17中加入通话) - 看到的管手册页。由于2.6.35这个值可以通过 F_SETPIPE_SZ 选项的fcntl()(见的fcntl手册页)到由 /指定的限制PROC / SYS / FS /管道尺寸-MAX ,但缓冲还是比较尴尬的改变点播比用户空间的动态分配的方案是。这意味着,您应对缓慢的插座能力将受到一定的限制 - 这是否可以接受取决于在你期望接收,并能够发送数据的速率

The issue which is related to your use of pipes, however, is that on Linux the size of a pipe's buffer is somewhat inflexible. It defaults to 64K since Linux 2.6.11 (the tee() call was added in 2.6.17) - see the pipe manpage. Since 2.6.35 this value can be changed via the F_SETPIPE_SZ option to fcntl() (see the fcntl manpage) up to the limit specified by /proc/sys/fs/pipe-size-max, but the buffering is still more awkward to change on-demand than a dynamically allocated scheme in user-space would be. This means that your ability to cope with slow sockets will be somewhat limited - whether this is acceptable depends on the rate at which you expect to receive and be able to send data.

假设这缓冲策略是可以接受的,你在你的前提是你需要跟踪多少数据每个目的地管道已从源消耗正确的,这是唯一的安全丢掉的所有目标都管消耗的数据。这有些由事实复杂,因为三通()没有概念的偏移 - 你只能从管道开始复制。这样做的后果是,你可以在最慢的插座的速度只复制,因为你不能用三通()复制到目标管道,直到一些数据已经从源消耗,你不能做的这个的,直到所有的插座有你即将消耗数据。

Assuming this buffering strategy is acceptable, you're correct in your assumption that you'll need to track how much data each destination pipe has consumed from the source, and it's only safe to discard data which all destination pipes have consumed. This is somewhat complicated by the fact that tee() doesn't have the concept of an offset - you can only copy from the start of the pipe. The consequence of this is that you can only copy at the speed of the slowest socket, since you can't use tee() to copy to a destination pipe until some of the data has been consumed from the source, and you can't do this until all the sockets have the data you're about to consume.

您如何处理这取决于你的数据的重要性。如果你真的需要发球的速度()拼接(),你有信心,一个缓慢的插座这是一个极为罕见的事件,你可以做这样的事情(我假设你使用非阻塞IO和单个线程,但类似的事情也多线程工作):

How you handle this depends on the importance of your data. If you really need the speed of tee() and splice(), and you're confident that a slow socket will be an extremely rare event, you could do something like this (I've assumed you're using non-blocking IO and a single thread, but something similar would also work with multiple threads):


  1. 确保所有管道都非阻塞(使用的fcntl(D,F_SETFL,O_NONBLOCK)使每个文件描述符非阻塞)。

  2. 初始化为每个目的地管道 read_counter 变到零。

  3. 使用类似的epoll()等到有什么东西在源管道。

  4. 遍历所有目标的管道,其中 read_counter 为零,称三通()将数据传送到每一个。请确保你通过 SPLICE_F_NONBLOCK 中的标志。

  5. 增量 read_counter 用于通过转帐金额三通每个目的地管()。跟踪最低的结果值。

  6. 找到 read_counter 的最低结果值 - 如果这是非零,则丢弃数据从源管材的金额(使用拼接()调用与目标上的的/ dev / null的打开,例如)。丢弃数据后,减去从 read_counter 丢弃量的所有的管道(因为这是最低值,那么这会不会导致任何人成为负)。

  7. 从步重复的 3

  1. Make sure all pipes are non-blocking (use fcntl(d, F_SETFL, O_NONBLOCK) to make each file descriptor non-blocking).
  2. Initialise a read_counter variable for each destination pipe to zero.
  3. Use something like epoll() to wait until there's something in the source pipe.
  4. Loop over all destination pipes where read_counter is zero, calling tee() to transfer data to each one. Make sure you pass SPLICE_F_NONBLOCK in the flags.
  5. Increment read_counter for each destination pipe by the amount transferred by tee(). Keep track of the lowest resultant value.
  6. Find the lowest resultant value of read_counter - if this is non-zero, then discard that amount of data from the source pipe (using a splice() call with a destination opened on /dev/null, for example). After discarding data, subtract the amount discarded from read_counter on all the pipes (since this was the lowest value then this cannot result in any of them becoming negative).
  7. Repeat from step 3.

请注意:这是我绊倒在过去的一件事是, SPLICE_F_NONBLOCK 影响是否三通()的管道接头()操作是非阻塞的,而 O_NONBLOCK 设置fnctl()影响与其他呼叫的相互作用是否(如阅读()的write())是无阻塞。如果你想要的一切是非阻塞的,均设置。还记得让你的插座无阻塞或拼接()通话将数据传输到他们可能会阻止(除非这就是你想要什么,如果您使用的是螺纹的方法)。

Note: one thing that's tripped me up in the past is that SPLICE_F_NONBLOCK affects whether the tee() and splice() operations on the pipes are non-blocking, and the O_NONBLOCK you set with fnctl() affects whether the interactions with other calls (e.g. read() and write()) are non-blocking. If you want everything to be non-blocking, set both. Also remember to make your sockets non-blocking or the splice() calls to transfer data to them might block (unless that's what you want, if you're using a threaded approach).

正如你可以看到,这个战略有一个主要问题 - 只要一个插座块了,一切都停止 - 对于插座目标管将填满,然后源管会变得迟钝。所以,如果你走上舞台,其中三通()收益 EAGAIN 步骤 4 ,然后你会想要么关闭套接字,或者至少是断开它(即把它拿出你的循环),使得你不写任何东西给它,直到它的输出缓冲区为空。这取决于您选择是否将数据流可以恢复从有位跳过它。

As you can see, this strategy has a major problem - as soon as one socket blocks up, everything halts - the destination pipe for that socket will fill up, and then the source pipe will become stagnant. So, if you reach the stage where tee() returns EAGAIN in step 4 then you'll want to either close that socket, or at least "disconnect" it (i.e. take it out of your loop) such that you don't write anything else to it until its output buffer is empty. Which you choose depends on whether your data stream can recovery from having bits of it skipped.

如果您想以应对网络延迟更优雅,那么你会需要做更多的缓冲,而这将涉及任何用户空间的缓冲区(其中而否定的发球优势()拼接())或者基于磁盘的缓存。基于磁盘的缓存几乎肯定会比用户空间的缓冲显著慢,因此不宜因为presumably你想了很多的速度,因为你选择了三通()拼接()摆在首位,但我提到它的完整性。

If you want to cope with network latency more gracefully then you're going to need to do more buffering, and this is going to involve either user-space buffers (which rather negates the advantages of tee() and splice()) or perhaps disk-based buffer. The disk-based buffering will almost certainly be significantly slower than user-space buffering, and hence not appropriate given that presumably you want a lot of speed since you've chosen tee() and splice() in the first place, but I mention it for completeness.

一件事,如果你最终在任何时候插入从用户空间数据值得一提的是 vmsplice()通话,能执行从用户自收集输出空间进入管道,以类似的方式,以在 writev则()呼叫。如果你正在做足够的缓冲,你已经(如果您使用的是池分配方式为例)多个不同的分配的缓冲区之间的分割你的数据,这可能是有用的。

One thing that's worth noting if you end up inserting data from user-space at any point is the vmsplice() call which can perform "gather output" from user-space into a pipe, in a similar way to the writev() call. This might be useful if you're doing enough buffering that you've split your data among multiple different allocated buffers (for example if you're using a pool allocator approach).

最后,你能想象用三通()拼接(),如果他们未能跟上,在移动它们较慢的用户空间缓冲。这将您的实现变得复杂,但如果你正在处理大量的连接而其中只有很小一部分是缓慢的,那么你还在减少复制到的有所涉及用户空间量。但是,这永远只能是一个短期的措施,以应对暂时性的网络问题 - 正如我所说本来,你已经有了一个根本性的问题,如果你的插座比你慢的来源。你会最终击出了一些缓冲限制,并需要跳过数据或关闭连接。

Finally, you could imagine swapping sockets between the "fast" scheme of using tee() and splice() and, if they fail to keep up, moving them on to a slower user-space buffering. This is going to complicate your implementation, but if you're handling large numbers of connections and only a very small proportion of them are slow then you're still reducing the amount of copying to user-space that's involved somewhat. However, this would only ever be a short-term measure to cope with transient network issues - as I said originally, you've got a fundamental problem if your sockets are slower than your source. You'd eventually hit some buffering limit and need to skip data or close connections.

总之,我会仔细考虑你为什么需要的速度三通()拼接()是否,为您的使用情况,只需在内存或磁盘上用户空间的缓冲会更合适。如果你相信,速度永远是高的,然而,有限的缓冲是可以接受的话,我上面提到的方法应该工作。

Overall, I would carefully consider why you need the speed of tee() and splice() and whether, for your use-case, simply user-space buffering in memory or on disk would be more appropriate. If you're confident that the speeds will always be high, however, and limited buffering is acceptable then the approach I outlined above should work.

另外,有一件事我要提的是,这会让你的code非常的Linux特有的 - 我不知道这些电话是在其他Unix变种支持的。在的sendfile()通话比更严格的拼接(),但可能是相当更加便于携带。如果你真的想要的东西可移植的,坚持到用户空间缓冲。

Also, one thing I should mention is that this will make your code extremely Linux-specific - I'm not aware of these calls being support in other Unix variants. The sendfile() call is more restricted than splice(), but might be rather more portable. If you really want things to be portable, stick to user-space buffering.

让我知道如果有什么事,我已经介绍您希望在更多的细节。

Let me know if there's anything I've covered which you'd like more detail on.

这篇关于发送使用管道数据到多个插座,三通()和剪接()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆