将大型数据写入套接字时,最大限度地减少副本 [英] Minimizing copies when writing large data to a socket

查看:82
本文介绍了将大型数据写入套接字时,最大限度地减少副本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个处理图像(大数据)的应用程序服务器.我正在尝试将图像数据发送回客户端时减少副本.我需要发送给客户端的已处理图像位于从jemalloc获得的缓冲区中.我想到的将数据发送回客户端的方式是:

I am writing an application server that processes images (large data). I am trying to minimize copies when sending image data back to clients. The processed images I need to send to clients are in buffers obtained from jemalloc. The ways I have thought of sending the data back to the client is:

1)简单的写调用.

// Allocate buffer buf.
// Store image data in this buffer.
write(socket, buf, len);

2)尽管我假定jemalloc已经使用mmap创建了缓冲区,但我通过mmap而不是jemalloc获得了缓冲区.然后我打个简单的电话写.

2) I obtain the buffer through mmap instead of jemalloc, though I presume jemalloc already creates the buffer using mmap. I then make a simple call to write.

buf = mmap(file, len);  // Imagine proper options.
// Store image data in this buffer.
write(socket, buf, len);

3)我像以前一样通过mmap获得了一个缓冲区.然后,我使用sendfile发送数据:

3) I obtain a buffer through mmap like before. I then use sendfile to send the data:

buf = mmap(in_fd, len);  // Imagine proper options.
// Store image data in this buffer.
int rc;
rc = sendfile(out_fd, file, &offset, count);
// Deal with rc.

考虑到jemalloc首先要通过mmap分配内存,因此(1)和(2)可能会做同样的事情.我不确定(3).这真的会带来任何好处吗? Linux零拷贝方法上的文章上的图4表明可以使用sendfile防止复制:

It seems like (1) and (2) will probably do the same thing given jemalloc probably allocates memory through mmap in the first place. I am not sure about (3) though. Will this really lead to any benefits? Figure 4 on this article on Linux zero-copy methods suggests that a further copy can be prevented using sendfile:

没有数据复制到套接字缓冲区中.相反,只有描述符 包含有关数据的下落和长度的信息 附加到套接字缓冲区. DMA引擎直接传递数据 从内核缓冲区到协议引擎,从而消除了 剩余的最终副本.

no data is copied into the socket buffer. Instead, only descriptors with information about the whereabouts and length of the data are appended to the socket buffer. The DMA engine passes data directly from the kernel buffer to the protocol engine, thus eliminating the remaining final copy.

如果一切顺利,这似乎是一个胜利.我不知道我的mmaped缓冲区是否算作内核缓冲区.我也不知道何时可以安全地重用此缓冲区.由于fd和length是添加到套接字缓冲区的唯一内容,因此我假设内核实际上是将该数据异步写入套接字的.如果可以,则sendfile的返回表示什么?我怎么知道何时重新使用此缓冲区?

This seems like a win if everything works out. I don't know if my mmaped buffer counts as a kernel buffer though. Also I don't know when it is safe to re-use this buffer. Since the fd and length is the only thing appended to the socket buffer, I assume that the kernel actually writes this data to the socket asynchronously. If it does what does the return from sendfile signify? How would I know when to re-use this buffer?

所以我的问题是:

  1. 将大型缓冲区(在我的情况下为图像)写入套接字的最快方法是什么?图像保存在内存中.
  2. 在映射文件上调用sendfile是一个好主意吗?如果是,什么是陷阱?这甚至会带来胜利吗?

推荐答案

似乎我的猜想是正确的.我从文章中获取了信息.引用:

It seems like my suspicions were correct. I got my information from this article. Quoting from it:

此外,这些网络写入系统调用(包括sendfile)可能和 在许多情况下,在通过该方法通过TCP发送的数据之前,确实要返回 呼叫已被确认.这些方法将在所有数据返回后立即返回 被写入套接字缓冲区(sk buff)并被推送到TCP 从此开始,TCP引擎就可以单独管理队列了.在 换句话说,在sendfile返回最后一个TCP发送窗口的时间是 并未实际发送到远程主机,但已排队.在这种情况下 支持分散聚集DMA,没有单独的缓冲区 保留这些字节,而缓冲区(sk buffs)仅保留指向 OS缓冲区高速缓存的页面,文件内容位于此页面. 如果我们修改 对应于最后一个TCP发送窗口中的数据的文件 sendfile返回.结果,TCP引擎可能会发送新写入的 数据发送到远程主机,而不是我们最初打算的目的 发送.

Also these network write system calls, including sendfile, might and in many cases do return before the data sent over TCP by the method call has been acknowledged. These methods return as soon as all data is written into the socket buffers (sk buff) and is pushed to the TCP write queue, the TCP engine can manage alone from that point on. In other words at the time sendfile returns the last TCP send window is not actually sent to the remote host but queued. In cases where scatter-gather DMA is supported there is no seperate buffer which holds these bytes, rather the buffers(sk buffs) just hold pointers to the pages of OS buffer cache, where the contents of file is located. This might lead to a race condition if we modify the content of the file corresponding to the data in the last TCP send window as soon as sendfile is returned. As a result TCP engine may send newly written data to the remote host instead of what we originally intended to send.

从mmapped文件提供的缓冲区甚至被认为是"DMA可用的",似乎没有办法知道在没有实际客户端的明确确认(通过网络)的情况下何时可以安全地重用它.我可能不得不坚持简单的写调用并招致多余的副本.有论文(也来自本文),其中有更多详细信息.

Provided the buffer from a mmapped file is even considered "DMA-able", seems like there is no way to know when it is safe to re-use it without an explicit acknowledgement (over the network) from the actual client. I might have to stick to simple write calls and incur the extra copy. There is a paper (also from the article) with more details.

编辑:此拼接调用中的文章也显示了问题.引用:

Edit: This article on the splice call also shows the problems. Quoting it:

请注意,在将数据从具有mmap的缓冲区拼接到网络时 套接字,无法说出何时所有数据都已发送.即使 splice()返回,则网络堆栈可能尚未发送所有数据.所以 重新使用缓冲区可能会覆盖未发送的数据.

Be aware, when splicing data from a mmap'ed buffer to a network socket, it is not possible to say when all data has been sent. Even if splice() returns, the network stack may not have sent all data yet. So reusing the buffer may overwrite unsent data.

这篇关于将大型数据写入套接字时,最大限度地减少副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆