为什么我长时间运行的 python 脚本会因“无效指针"而崩溃?运行约3天后? [英] Why does my long-running python script crash with "invalid pointer" after running for about 3 days?

查看:24
本文介绍了为什么我长时间运行的 python 脚本会因“无效指针"而崩溃?运行约3天后?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个 python 3 脚本来测试到 FPGA 的 SPI 链接.它在 Raspberry Pi 3 上运行.测试的工作原理如下:将 FPGA 置于测试模式(一个按钮开关)后,发送第一个字节,该字节可以是任何值.然后无限期发送更多字节.每个值都增加发送的第一个值,截断为 8 位.因此,如果第一个值为 37,则 FPGA 需要以下序列:

I wrote a python 3 script which tests an SPI link to an FPGA. It runs on an Raspberry Pi 3. The test works like this: after putting the FPGA in test mode (a push switch), send the first byte, which can be any value. Then further bytes are sent indefinitely. Each one increments by the first value sent, truncated to 8 bits. Thus, if the first value is 37, the FPGA expects the following sequence:

37, 74, 111, 148, 185, 222, 4, 41 ...

37, 74, 111, 148, 185, 222, 4, 41 ...

一些额外的 IO 引脚用于在设备之间发送信号 - RUN(RPi 输出)开始测试(这是必要的,因为如果 FPGA 需要一个字节,它会在大约 15 毫秒内超时)并且 ERR(FPGA 输出)发出错误信号.因此可以在两端计算错误.

Some additional IO pins are used to signal between the devices - RUN (RPi output) starts the test (necessary because the FPGA times out in about 15ms if it expects a byte) and ERR (FPGA output) signals an error. Errors can thus be counted at both ends.

此外,RPi 脚本会写入一行发送的字节摘要和每百万字节的错误数.

In addition, the RPi script writes a one line summary of bytes sent and number of erros every million bytes.

所有这些都很好.但是运行了大约 3 天后,我在 RPi 上出现以下错误:

All of this works just fine. But after running for about 3 days, I get the following error on the RPi:

free(): 无效指针:0x00405340

free(): invalid pointer: 0x00405340

我在两个相同的测试设置上得到了完全相同的错误,即使是相同的内存地址.上次报告说已发送 4294M 字节,0 个错误"

I get this exact same error on two identical test setups, even the same memory address. The last report says "4294M bytes sent, 0 errors"

我似乎已经证明了 SPI 链接,但我担心这个长时间运行的程序会无缘无故地崩溃.

I seem to have proved the SPI link, but I am concerned that this long-running program crashes for no apparent reason.

这是我的测试代码的重要部分:

Here is the important part of my test code:

def _report(self, msg):
        now = datetime.datetime.now()
        os.system("echo "{} : {}" > spitest_last.log".format(now, msg))

    def spi_test(self):
        global end_loop
        input("Put the FPGA board into SPI test mode (SW1) and press any key")
        self._set_run(True)
        self.END_LOOP = False
        print("SPI test is running, CTRL-C to end.")
        # first byte is sent without LOAD, this is the seed
        self._send_byte(self._val)
        self._next_val()
        end_loop = False
        err_flag = False
        err_cnt = 0
        byte_count = 1
        while not end_loop:
            mb = byte_count % 1000000 
            if mb == 0:
                msg = "{}M bytes sent, {} errors".format(int(byte_count/1000000), err_cnt)
                print("
" + msg, end="")
                self._report(msg)
                err_flag = True
            else:
                err_flag = False
            #print("sending: {}".format(self._val))
            self._set_load(True)
            if self._errors and err_flag:
                self._send_byte(self._val + 1)
            else:
                self._send_byte(self._val)
            if self.is_error():
                err_cnt += 1
                msg = "{}M bytes sent, {} errors".format(int(byte_count/1000000), err_cnt)
                print("
{}".format(msg), end="")
                self._report(msg)
            self._set_load(False)
            # increase the value by the seed and truncate to 8 bits
            self._next_val()
            byte_count += 1

        # test is done
        input("
SPI test ended ({} bytes sent, {} errors). Press ENTER to end.".format(byte_count, err_cnt))
        self._set_run(False)

(澄清说明:有一个命令行选项可以人为地每百万字节创建一个错误.因此err_flag"变量.)

(Note for clarification : there is a command line option to artifically create an error every million bytes. Hence the " err_flag" variable.)

我已经尝试在控制台模式下使用 python3,并且 byte_count 变量的大小似乎没有问题(根据我阅读的有关 python 整数大小限制的内容,不应该存在).

I've tried using python3 in console mode, and there seems to be no issue with the size of the byte_count variable (there shouldn't be, according to what I have read about python integer size limits).

有人知道可能导致这种情况的原因吗?

Anyone have an idea as to what might cause this?

推荐答案

此问题仅与 spidev 3.5 之前的版本有关. 以下评论是在假设我使用的是升级版本的情况下完成的来自spidev.

This issue is connected to spidev versions older than 3.5 only. The comments below were done under assumption that I was using the upgraded version of spidev.

########################################################################

#############################################################################

我可以确认这个问题.它对 RPi3B 和 RPi4B 都是持久的.在 RPi3 和 RPi4 上使用 python 3.7.3.我试过的spidev版本是3.3、3.4和最新的3.5.通过简单地循环遍历这一行,我能够多次重现此错误.

I can confirm this problem. It is persistent with both RPi3B and RPi4B. Using python 3.7.3 at both RPi3 and RPi4. The version of spidev which I tried were 3.3, 3.4 and the latest 3.5. I was able to reproduce this error several times by simply looping through this single line.

spidevice2.xfer2([0x00, 0x00, 0x00, 0x00])

最多需要 11 小时,具体取决于 RPi 版本.在 1073014000 次调用(四舍五入为 1000)后,脚本因无效指针"而崩溃.发送的总字节数与 danmcb 的情况相同.似乎 2^32 字节代表了一个限制.

It takes up to 11 hours depending on the RPi version. After 1073014000 calls (rounded to 1000), the script crashes because of "invalid pointer". The total amount of bytes sent is the same as in danmcb's case. It seems as if 2^32 bytes represent a limit.

我尝试了不同的方法.例如,不时调用close(),然后调用open().这没有帮助.

I tried different approaches. For example, calling close() from time to time followed by open(). This did not help.

然后,我尝试在本地创建 spiDev 对象,以便为每批数据重新创建它.

Then, I tried to create the spiDev object locally, so it would re-created for every batch of data.

def spiLoop():
    spidevice2 = spidev.SpiDev()
    spidevice2.open(0, 1)
    spidevice2.max_speed_hz = 15000000
    spidevice2.mode = 1 # Data is clocked in on falling edge
    
    for j in range(100000):
        spidevice2.xfer2([0x00, 0x00, 0x00, 0x00])
        
    spidevice2.close()

它仍然在大约之后崩溃了.2^30 次调用 xfer2([0x00, 0x00, 0x00, 0x00]) 对应于大约.2^32 字节.

It still crashed at after approx. 2^30 calls of xfer2([0x00, 0x00, 0x00, 0x00]) which corresponds to approx. 2^32 bytes.

EDIT1

为了加快进程,我使用下面的代码以 4096 字节的块发送.我在本地反复创建了 SpiDev 对象.花了 2 个小时才达到 2^32 字节计数.

To speed up the process, I was sending in blocks of 4096 bytes using the code below. And I repeatedly created the SpiDev object locally. It took 2 hours to arrive at 2^32 bytes count.

def spiLoop():
    spidevice2 = spidev.SpiDev()
    spidevice2.open(0, 1)
    spidevice2.max_speed_hz = 25000000
    spidevice2.mode = 1 # Data is clocked in on falling edge
    
    to_send = [0x00] * 2**12 # 4096 bytes
    for j in range(100):
        spidevice2.xfer2(to_send)
        
    spidevice2.close()
    del spidevice2

def runSPI():
    for i in range(2**31 - 1):
        spiLoop()            
        print((2**12 * 100 * (i + 1)) / 2**20, 'Mbytes')

EDIT2

即时重新加载 spidev 也无济于事.我在 RPi3 和 RPi4 上尝试了这段代码,结果相同:

Reloading the spidev on the fly does not help either. I tried this code on both RPi3 and RPi4 with the same result:

import importlib
def spiLoop():
    importlib.reload(spidev)
    spidevice2 = spidev.SpiDev()
    spidevice2.open(0, 1)
    spidevice2.max_speed_hz = 25000000
    spidevice2.mode = 1 # Data is clocked in on falling edge
    
    to_send = [0x00] * 2**12 # 4096 bytes
    for j in range(100):
        spidevice2.xfer2(to_send)
        
    spidevice2.close()
    del spidevice2

def runSPI():
    for i in range(2**31 - 1):
        spiLoop()            
        print((2**12 * 100 * (i + 1)) / 2**20, 'Mbytes')

EDIT3

执行代码片段也没有隔离问题.发送第 4 个 1Gbyte 数据后,它崩溃了.

Executing the code snippet did not isolate the problem either. It crashed after the 4th chuck of 1Gbyte-data was sent.

program = '''
import spidev
spidevice = None

def configSPI():
    global spidevice
    
    # We only have SPI bus 0 available to us on the Pi
    bus = 0
    #Device is the chip select pin. Set to 0 or 1, depending on the connections
    device = 1

    spidevice = spidev.SpiDev()
    spidevice.open(bus, device)
    spidevice.max_speed_hz = 250000000
    
    spidevice.mode = 1 # Data is clocked in on falling edge

def spiLoop():
    to_send = [0xAA] * 2**12
    loops = 1024
    for j in range(loops):
        spidevice.xfer2(to_send)
    
    return len(to_send) * loops    

configSPI()
bytes_total = 0

while True:
    bytes_sent = spiLoop()
    bytes_total += bytes_sent            
    print(int(bytes_total / 2**20), "Mbytes", int(1000 * (bytes_total / 2**30)) / 10, "% finished")
    if bytes_total > 2**30:
        break

'''
for i in range(100):
    exec(program)
    print("program executed", i + 1, "times, bytes sent > ", (i + 1) * 2**30)

这篇关于为什么我长时间运行的 python 脚本会因“无效指针"而崩溃?运行约3天后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆