捕获数据包后会发生什么? [英] What happens after a packet is captured?

查看:105
本文介绍了捕获数据包后会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读有关数据包被NIC捕获后的情况,而阅读的越多,我就越困惑.

I've been reading about what happens after packets are captured by NICs, and the more I read, the more I'm confused.

首先,我读到传统上说,在NIC捕获到一个数据包之后,该数据包被复制到内核空间中的一块内存中,然后再复制到用户空间中,然后再处理该数据包数据上的任何应用程序.然后,我了解了DMA,其中NIC绕过CPU直接将数据包复制到内存中.那么NIC->内核内存->用户空间内存流仍然有效吗?另外,大多数NIC(例如Myricom)是否使用DMA来提高数据包捕获率?

Firstly, I've read that traditionally, after a packet is captured by the NIC, it gets copied to a block of memory in the kernel space, then to the user space for whatever application that then works on the packet data. Then I read about DMA, where the NIC directly copies the packet into memory, bypassing the CPU. So is the NIC -> kernel memory -> User space memory flow still valid? Also, do most NIC (e.g. Myricom) use DMA to improve packet capture rates?

第二,RSS(接收方缩放)在Windows和Linux系统中是否工作类似?我只能在MSDN文章中找到有关RSS的工作方式的详细说明,他们在其中谈论RSS(和MSI-X)如何在Windows Server 2008上工作.但是RSS和MSI-X的相同概念仍应适用于linux系统,对?

Secondly, does RSS (Receive Side Scaling) work similarly in both Windows and Linux systems? I can only find detailed explanations on how RSS works in MSDN articles, where they talk about how RSS (and MSI-X) works on Windows Server 2008. But the same concept of RSS and MSI-X should still apply for linux systems, right?

谢谢.

关于, 雷恩

推荐答案

此过程如何进行主要取决于驱动程序作者和硬件,但对于我查看或编写的驱动程序以及我所使用的硬件通常,它是这样工作的:

How this process plays out is mostly up to the driver author and the hardware, but for the drivers I've looked at or written and the hardware I've worked with, this is usually the way it works:

  1. 在驱动程序初始化时,它将分配一定数量的缓冲区,并将其提供给NIC.
  2. 当NIC收到一个数据包时,它会将下一个地址从其缓冲区列表中拉出,将数据直接DMA传输到其中,并通过中断通知驱动程序.
  3. 驱动程序获取中断,可以将缓冲区移交给内核,也可以分配新的内核缓冲区并复制数据.前者为零复制网络",显然需要操作系统的支持. (有关此内容,请参见下文)
  4. 驱动程序需要分配一个新的缓冲区(在零拷贝的情况下),否则它将重新使用该缓冲区.无论哪种情况,缓冲区都将返回给NIC以供将来的数据包使用.

内核中的零复制网络还不错.一直到用户区的零复制要困难得多. Userland获取数据,但是网络数据包由标头和数据组成.至少到用户地域为止,真正的零复制都需要您的NIC的支持,以便它可以将DMA数据包存储到单独的标头/数据缓冲区中.一旦内核将数据包路由到其目的地并验证校验和(对于TCP,在网卡支持的情况下在硬件中,或者在不支持的情况下在软件中;对于TCP,头将被回收;请注意,如果内核必须自己计算校验和,则将也可以复制数据:查看数据会导致高速缓存未命中,并将其复制到其他位置可以使用调整后的代码免费获得.)

Zero-copy networking within the kernel isn't so bad. Zero-copy all the way down to userland is much harder. Userland gets data, but network packets are made up of both header and data. At the least, true zero-copy all the way to userland requires support from your NIC so that it can DMA packets into separate header/data buffers. The headers are recycled once the kernel routes the packet to its destination and verifies the checksum (for TCP, either in hardware if the NIC supports it or in software if not; note that if the kernel has to compute the checksum itself, it'd may as well copy the data, too: looking at the data incurs cache misses and copying it elsewhere can be for free with tuned code).

即使假设所有星形对齐,当系统接收到数据时,该数据实际上也不在用户缓冲区中.在应用程序请求数据之前,内核不知道它将在哪里结束.考虑像Apache这样的多进程守护程序的情况.有许多子进程,所有子进程都在同一套接字上侦听.您还可以建立连接fork(),这两个进程都可以recv()传入数据.

Even assuming all the stars align, the data isn't actually in your user buffer when it is received by the system. Until an application asks for the data, the kernel doesn't know where it will end up. Consider the case of a multi-process daemon like Apache. There are many child processes, all listening on the same socket. You can also establish a connection, fork(), and both processes are able to recv() incoming data.

Internet上的TCP数据包通常为有效载荷1460字节(1500的MTU = 20字节IP报头+ 20字节TCP报头+ 1460字节数据). 1460不是2的幂,并且在您将找到的任何系统上都不匹配页面大小.这为数据流的重组提出了问题.请记住,TCP是面向流的.发送方写之间没有区别,等待接收到的两个1000字节写将完全在2000字节读中消耗.

TCP packets on the Internet are usually 1460 bytes of payload (MTU of 1500 = 20 byte IP header + 20 byte TCP header + 1460 bytes data). 1460 is not a power of 2 and won't match a page size on any system you'll find. This presents problems for reassembly of the data stream. Remember that TCP is stream-oriented. There is no distinction between sender writes, and two 1000 byte writes waiting at the received will be consumed entirely in a 2000 byte read.

更进一步,请考虑用户缓冲区.这些由应用程序分配.为了一直用于零拷贝,缓冲区需要进行页面对齐,并且不与其他任何共享该内存页面.在recv()时间,内核理论上可以用包含数据的页面重新映射旧页面并将其翻转"到位,但这由于上面的重组问题而变得复杂,因为连续的数据包将在单独的页面上.内核可以将它返回的数据限制在每个数据包的有效负载上,但这将意味着很多额外的系统调用,页面重新映射以及总体上可能会降低吞吐量.

Taking this further, consider the user buffers. These are allocated by the application. In order to be used for zero-copy all the way down, the buffer needs to be page-aligned and not share that memory page with anything else. At recv() time, the kernel could theoretically remap the old page with the one containing the data and "flip" it into place, but this is complicated by the reassembly issue above since successive packets will be on separate pages. The kernel could limit the data it hands back to each packet's payload, but this will mean a lot of additional system calls, page remapping and likely lower throughput overall.

我真的只是在这个话题上摸不着头脑.在2000年代初期,我曾在多家公司工作,试图将零拷贝概念扩展到用户领域.我们甚至在用户区实现了一个TCP堆栈,并完全规避了使用该堆栈的应用程序的内核,但这带来了自己的一系列问题,而且从未达到生产质量.这是一个很难解决的问题.

I'm really only scratching the surface on this topic. I worked at a couple of companies in the early 2000s trying to extend the zero-copy concepts down into userland. We even implemented a TCP stack in userland and circumvented the kernel entirely for applications using the stack, but that brought its own set of problems and was never production quality. It's a very hard problem to solve.

这篇关于捕获数据包后会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆