Linux套接字:零拷贝本地,TCP/IP远程 [英] Linux sockets: Zero-copy local, TCP/IP remote

查看:334
本文介绍了Linux套接字:零拷贝本地,TCP/IP远程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

网络是我在操作系统中最糟糕的领域,因此请原谅我提出一个不完整的问题.我已经读了几个小时了,但是有点像在脑海里游泳. (对我来说,与弄清楚网络协议相比,我觉得芯片设计更容易.)

Networking is my worst area in operating systems, so forgive me for asking perhaps an incomplete question. I've been reading about this for a few hours, but it's kinda swimming in my head. (To me, I feel like chip design is easy compared to figuring out networking protocols.)

我有一些通过套接字相互通信的联网服务.具体来说,套接字是使用fd = socket(PF_INET, SOCK_STREAM, 0);创建的,它会自动获取TCP/IP.我需要此作为基本情况,因为这些服务可能在单独的计算机上运行.

I have some networked services that communicate with each other via sockets. Specifically, the sockets are created with fd = socket(PF_INET, SOCK_STREAM, 0);, which automatically gets TCP/IP. I need this as the base case, because these services may be running on separate machines.

但是对于一个项目,我们试图将它们全部压缩到基于Atom Z530P的功能不足的嵌入式设备"中,因此在我看来,内存复制开销是我们可以优化的.我一直在这里阅读有关内容:数据- link-access-and-zero-copy Linux_packet_mmap packet_mmap .

But for one project, we're trying to squeeze all of them into an underpowered embedded 'appliance', based on an Atom Z530P, so it seems to me that the memory copy overhead is something we could optimize out. I've been reading about that here: data-link-access-and-zero-copy and Linux_packet_mmap and packet_mmap.

在这种情况下,将创建类似于以下内容的套接字:fd = socket(PF_PACKET, PF_RAW, 0);.还有很多其他的事情要做,例如分配环形缓冲区,映射它们,将它们与套接字关联等等.看来您只能使用sendtorecvfrom来传输数据.据我了解,由于套接字是本地套接字,因此不需要可靠的流"类型套接字,因此原始套接字是合适的接口,我猜测使用了环形缓冲区在页面粒度上,每个数据包(或数据报)从页面边界开始.

For this case, one would create the socket something like this: fd = socket(PF_PACKET, PF_RAW, 0);. And there's a bunch of other stuff to do, like allocating ring buffers, mmapping them, associating them with the socket, etc. It looks like you're restricted to using sendto and recvfrom in order to transmit data. As I understand it, since the socket is local, you don't need a reliable "stream" type socket, so raw sockets is the appropriate interface, and I'm guessing that the ring buffer is used at page granularity, where each packet (or datagram) starts at a page boundary.

在我花大量时间尝试进一步研究之前,我希望一些乐于助人的人可以帮助我解决一些问题:

Before I spend a huge amount of time trying to investigate this further, I was hoping some helpful individuals might help me with some questions:

  • 我应该从零拷贝套接字获得多少性能优势?我想我检查的最后一个是,我们正在将最大每秒40 MB的 maximum 从一个进程移到另一个进程,最后移到磁盘.在最基本的情况下,数据从捕获过程转移到一对多过程(其他人可以在流上侦听),再转移到写入磁盘的存档器过程.那是两跳,不计算磁盘和内部内容.
  • Linux是否会自动执行上述任何一项操作,以优化在同一台计算机上运行的进程?
  • 无论如何,我都会在TCP端口中监听套接字.我可以使用它们在进程之间建立连接,但仍然可以使用零复制吗?换句话说,我可以将AF_INET与PF_PACKET一起使用吗?
  • 带有SOCK_RAW的PF_PACKET是零拷贝套接字的唯一有效配置吗?
  • 有没有很好的示例代码可以使用零复制和TCP/IP作为后备?
  • 检测两个进程是否在同一台计算机上的最简单或最佳方法是什么?他们知道彼此的IP地址,因此我可以比较每个IP地址并使用不同的代码路径.有没有更简单的方法可以做到这一点?
  • 我可以在基于数据包的套接字上使用write()和read(),还是仅对流有效? (重写连接方式将比重写所有套接字代码更简单.)
  • 我是否使事情变得过于复杂和/或优化了错误的事情? OProfiler告诉我,大多数CPU时间都花在两个地方:(1)zlib和(2)内核,由于我使用的是不提供vmlinux的CentOS 6.2,因此无法进行分析.我假设内核时间是空闲时间和数据复制的结合,并且没有太多其他作用.
  • How much performance benefit should I expect to get here from zero-copy sockets? I think the last I checked, we were moving an maximum of like 40 MB/sec from one process to another and finally to the disk. In the most basic scenario, data moves from the capture process, to the one-to-many process (others can listen in on the stream), to the archiver process that writes to disk. That's two hops not counting the disk and internal stuff.
  • Does Linux do any of this automatically, optimizing for processes running on the same machine?
  • In any case, I would have listening sockets in TCP ports. Can I use those to make connections between processes yet still be able to use zero-copy? In other words, can I use AF_INET with PF_PACKET?
  • Is PF_PACKET with SOCK_RAW the only valid configuration for zero-copy sockets?
  • Is there any good sample code out there that will use zero-copy with TCP/IP as a fallback?
  • What's the simplest or best way to detect that the two processes are on the same machine? They know each other's IP addresses, so I could just compare and use different code paths for each. Is there a simpler way to do this?
  • Can I use write() and read() on a packet-based socket, or are those only valid for streams? (Rewriting how connections are made would be simpler then rewriting ALL of the socket code.)
  • Am I over-complicating things and/or optimizing the wrong thing? OProfiler tells me that most CPU time is spent in two places: (1) zlib, and (2) the kernel, which I can't profile since I'm using CentOS 6.2, which doesn't provide a vmlinux. I assume the kernel time is a combination of idle time and data copying and not much else.

提前感谢您的帮助!

推荐答案

我是否使事情变得过于复杂和/或优化了错误的事情?

Am I over-complicating things and/or optimizing the wrong thing?

可能.使用PF_PACKET套接字仅用于特殊用途.您可能想调查

Possibly. Using PF_PACKET sockets is only for specialized stuff. You probably want to look into

  • sendfile(2)
  • splice(2)

检测这两个过程是最简单还是最好的方法是什么 在同一台机器上?

What's the simplest or best way to detect that the two processes are on the same machine?

根本不会忘记"此信息.

Simply not "forgetting" this information.

Linux是否自动执行任何上述操作,以优化流程 在同一台机器上运行?

Does Linux do any of this automatically, optimizing for processes running on the same machine?

不,你必须自己做.

这篇关于Linux套接字:零拷贝本地,TCP/IP远程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆