UDP数据包被Linux内核丢弃 [英] UDP packet drops by linux kernel

查看:666
本文介绍了UDP数据包被Linux内核丢弃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个服务器,该服务器通过多播发送UDP数据包,还有一些列出这些多播数据包的客户端. 每个数据包的固定大小为1040字节,服务器发送的整个数据大小为3GByte.

I have a server which sends UDP packets via multicast and a number of clients which are listing to those multicast packets. Each packet has a fixed size of 1040 Bytes, the whole data size which is sent by the server is 3GByte.

我的环境如下:

1 Gbit以太网络

1 Gbit Ethernet Network

40个节点,1个发送方节点和39个接收方节点. 所有节点具有相同的硬件配置:2个AMD CPU,每个CPU具有2个核心@ 2,6GHz

40 Nodes, 1 Sender Node and 39 receiver Nodes. All Nodes have the same hardware configuration: 2 AMD CPUs, each CPU has 2 Cores @2,6GHz

在客户端,一个线程读取套接字并将数据放入队列.还有一个线程从队列中弹出数据并进行一些轻量级处理.

On the client side, one thread reads the socket and put the data into a queue. One additional thread pops the data from the queue and does some light weight processing.

在多播传输期间,我在节点侧识别出30%的数据包丢失率.通过观察netstat -su统计信息,我可以说,客户端应用程序丢失的数据包等于netstat输出中的RcvbufErrors值.

During the multicast transmission I recognize a packet drop rate of 30% on the node side. By observing the netstat –su statistics I can say, that the missing packets by the client application are equal to the RcvbufErrors value from the netstat output.

这意味着操作系统会丢弃所有丢失的数据包,因为套接字缓冲区已满,但是我不明白为什么捕获线程无法及时读取缓冲区. 在传输期间,4个核心中的2个被利用率为75%,其余处于休眠状态. 我是唯一使用这些节点的人,我认为这种机器在处理1Gbit带宽方面没有问题.我已经通过为amd cpus添加g ++编译器标志进行了一些优化,这将数据包丢弃率降低到10%,但是我认为它仍然太高.

That means that all missing packets are dropped by the OS because the socket buffer was full, but I do not understand why the capturing thread is not able to read the buffer in time. During the transmission, 2 of the 4 cores are utilized by 75%, the rest is sleeping. I’m the only one who is using these nodes, and I would assume that this kind of machines have no problem to handle 1Gbit bandwidth. I have already done some optimization, by adding g++ compiler flags for amd cpus, this decrease the packet drop rate to 10%, but it is still too high in my opinion.

我当然知道UDP不可靠,我有自己的校正协议.

Of course I know that UDP is not reliable, I have my own correction protocol.

我没有任何管理权限,因此我无法更改系统参数.

I do not have any administration permissions, so it’s not possible for me to change the system parameters.

任何提示如何提高性能?

Any hints how can I increase the performance?

我通过使用2个读取套接字的线程解决了这个问题. recv套接字缓冲区有时仍会变满.但是平均下降幅度不到1%,因此处理它并不是问题.

I solved this issue by using 2 threads which are reading the socket. The recv socket buffer still becomes full sometimes. But the average drop is under 1%, so it isn't a problem to handle it.

推荐答案

在Linux上跟踪网络丢弃可能会有些困难,因为有许多组件可能发生数据包丢弃.它们可以发生在硬件级别,网络设备子系统或协议层中.

Tracking down network drops on Linux can be a bit difficult as there are many components where packet drops can happen. They can occur at the hardware level, in the network device subsystem, or in the protocol layers.

我写了一个非常详细的博客文章,说明了如何监视和调整每个组件.在这里很难将其概括为一个简洁的答案,因为有许多不同的组件需要监视和调整.

I wrote a very detailed blog post explaining how to monitor and tune each component. It's a bit hard to summarize as a succinct answer here since there are so many different components that need to be monitored and tuned.

这篇关于UDP数据包被Linux内核丢弃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆