Python UDP套接字半随机无法接收 [英] Python UDP socket semi-randomly failing to receive

查看:42
本文介绍了Python UDP套接字半随机无法接收的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一些问题,我猜是代码的问题.

I have a problem with something and I'm guessing it's the code.

该应用程序用于ping"一些定制的网络设备,以检查它们是否处于活动状态.它每 20 秒用一个特殊的 UDP 数据包 ping 它们并期待响应.如果他们连续 3 次未能回答 ping,则应用程序会向工作人员发送警告消息.

The application is used to 'ping' some custom made network devices to check if they're alive. It pings them every 20 seconds with a special UDP packet and expects a response. If they fail to answer 3 consecutive pings the application sends a warning message to the staff.

该应用程序每天 24/7 全天候运行,并且每天有随机次数(主要是 2-5 次),该应用程序在 10 分钟的确切时间内无法接收 UDP 数据包,之后一切都会恢复正常.在这 10 分钟内,似乎只有 1 台设备在回复,其他设备似乎已死.我已经能够从日志中推断出来.

The application is running 24/7 and for a random number of times a day (2-5 mostly) the application fails to receive UDP packets for an exact time of 10 minutes, after which everything goes back to normal. During those 10 minutes only 1 device seems to be replying, others seem dead. That I've been able to deduce from the logs.

我使用了wireshark 来嗅探数据包,并且我已经验证ping 数据包既可以进出也可以进出,所以网络部分似乎工作正常,一直到操作系统.这些计算机运行的是 WinXPPro,有些计算机根本没有配置防火墙.我在不同的计算机、不同的 Windows 安装和不同的网络上都遇到了这个问题.

I've used wireshark to sniff the packets and I've verified that ping packets are going both out AND in, so the network part seems to be working okay, all the way to the OS. The computers are running WinXPPro and some have no configured firewall whatsoever. I'm having this issue on different computers, different windows installs and different networks.

我真的不知道这里可能有什么问题.

I'm really at a loss as to what might be the problem here.

我附上了执行所有网络的代码的相关部分.这与应用程序的其余部分在一个单独的线程中运行.

I'm attaching the relevant part of the code which does all the network. This is run in a separate thread from the rest of the application.

我预先感谢您提供的任何见解.

I thank you in advance for whatever insight you might provide.

def monitor(self):
    checkTimer = time()
    while self.running:
        read, write, error = select.select([self.commSocket],[self.commSocket],[],0)
        if self.commSocket in read:
            try:
                data, addr = self.commSocket.recvfrom(1024)
                self.processInput(data, addr)
            except:
                pass

        if time() - checkTimer > 20: # every 20 seconds
            checkTimer = time()
            if self.commSocket in write:
                for rtc in self.rtcList:
                    try:
                        addr = (rtc, 7) # port 7 is the echo port
                        self.commSocket.sendto('ping',addr)
                        if not self.rtcCheckins[rtc][0]: # if last check was a failure
                            self.rtcCheckins[rtc][1] += 1 # incr failure count
                        self.rtcCheckins[rtc][0] = False # setting last check to failure
                    except:
                        pass

        for rtc in self.rtcList:
            if self.rtcCheckins[rtc][1] > 2: # didn't answer for a whole minute
                self.rtcCheckins[rtc][1] = 0
                self.sendError(rtc)

推荐答案

你不提,所以我要提醒你,既然你在使用 select() 套接字最好是非-阻塞.否则你的 recvfrom() 会阻塞.如果处理得当,应该不会真的发生,但从简短的代码片段中很难判断.

You don't mention it, so I have to remind you that since you are using select() that socket better be non-blocking. Otherwise your recvfrom() can block. Should not really happen when dealt with properly, but hard to tell from the short code snippet.

那么您就不必检查 UDP 套接字的可写性 - 它始终是可写的.

Then you don't have to check UDP socket for writability - it is always writable.

现在是真正的问题 - 您说数据包正在进入系统,但您的代码没有收到它们.这很可能是由于套接字接收缓冲区溢出.在过去的 15 年中,ping 目标的数量是否增加了?您正在为 ping 响应风暴做好准备,并且可能没有足够快地读取这些响应,因此它们堆积在接收缓冲区中并最终被丢弃.

Now for the real problem - you are saying that packets are entering the system, but your code does not receive them. This is most probably due to the overflow of the socket receive buffer. Did the number of ping targets increase over those last 15 years? You are setting yourself up for a ping-response storm, and probably not reading those responses fast enough, so they pile up in the receive buffer and eventually get dropped.

我对投资回报率的建议:

My suggestions in order of ROI:

  • 分散 ping 请求,不要为 DDOS 设置自己.比方说,每次迭代查询一个系统,并保留每个目标的上次检查时间.这将使您能够平衡出入数据包的数量.
  • SO_RCVBUF 增加到一个较大的值.这将使您的网络堆栈能够更好地处理数据包突发.
  • 在循环中读取数据包,即一旦您的 UDP 套接字可读(假设它是非阻塞的),则一直读取直到您获得 EWOULDBLOCK.这将为您节省一堆 select() 调用.
  • 看看你是否可以使用一些类似于 Linux 的高级 Windows API recvmmsg(2),如果存在这种情况,每个系统调用将多个数据包出列.
  • Spread out ping requests, don't set yourself up for a DDOS. Query, say, one system per iteration and keep last check time per target. This will allow you to even out the number of packets out and in.
  • Increase SO_RCVBUF to a large value. This will allow your network stack to better deal with packet bursts.
  • Read packets in a loop, i.e. once your UDP socket is readable (assuming it's non-blocking), read until you get EWOULDBLOCK. This would save you bunch of select() calls.
  • See if you can use some advanced Windows API along the lines of Linux recvmmsg(2), if such thing exists, to dequeue multiple packets per syscall.

希望这会有所帮助.

这篇关于Python UDP套接字半随机无法接收的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆