arm64上的Linux:sendto导致“未处理的错误:对齐错误(0x96000021)"当从 mmapped 相干 DMA 缓冲区发送数据时 [英] Linux on arm64: sendto causes "Unhandled fault: alignment fault (0x96000021)" when sending data from mmapped coherent DMA buffer

查看:35
本文介绍了arm64上的Linux:sendto导致“未处理的错误:对齐错误(0x96000021)"当从 mmapped 相干 DMA 缓冲区发送数据时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个基于配备 arm64 CPU 的 UltraScale+ FPGA 的数据采集系统.数据通过 DMA 传输到 RAM.驱动程序中的 DMA 缓冲区保留如下:

I'm building a data acquisition system based on the UltraScale+ FPGA equipped with arm64 CPU. The data are transmitted to RAM via DMA. The DMA buffers in the driver are reserved as below:

virt_buf[i] = dma_zalloc_coherent(&pdev->dev, BUF_SIZE, &phys_buf[i],GFP_KERNEL);

在驱动的mmap函数中,映射到用户空间的方式如下:

In the driver's mmap function, the mapping to the user space is done in the following way:

#ifdef ARCH_HAS_DMA_MMAP_COHERENT
   printk(KERN_INFO "Mapping with dma_map_coherent DMA buffer at phys: %p virt %p\n",phys_buf[off],virt_buf[off]);
   res = dma_mmap_coherent(&my_pdev->dev, vma, virt_buf[off], phys_buf[off],  vsize);
#else
   physical = phys_buf[off];
   res=remap_pfn_range(vma,vma->vm_start, physical >> PAGE_SHIFT , vsize, pgprot_noncached(vma->vm_page_prot));
   printk(KERN_INFO "Mapping with remap_pfn_range DMA buffer at phys: %p virt %p\n",physical,virt_buf[off]);
#endif

在我的 UltraScale+ CPU 上使用 remap_pfn_range.在用户空间应用程序中,数据从缓冲区读取,当前立即以 UDP 数据包的形式发送,长度限制为 MAX_DGRAM(最初等于 572).

On my UltraScale+ CPU remap_pfn_range is used. In the user space application the data are read from the buffer, and currently immediately send in UDP packets with length limited to MAX_DGRAM (originally equal to 572).

 int i = 0;
 int bleft = nbytes;
 while(i<nbytes) {
    int bts = bleft < MAX_DGRAM ? bleft : MAX_DGRAM;
    if (sendto(fd,&buf[nbuf][i],bts,0, res2->ai_addr,res2->ai_addrlen)==-1) {
       printf("%s",strerror(errno));
       exit(1);
    }
    bleft -= bts;
   i+= bts;
 }

在 32 位 Zynq FPGA 上一切正常.但是,在我将其移至 64 位 UltraScale+ FPGA 后,经过数百次传输后,我开始收到随机错误.

Everything worked perfectly on the 32-bit Zynq FPGA. However, after I moved it to the 64-bit UltraScale+ FPGA, I started to receive random errors, after a few hundreds of transfers.

[  852.703491] Unhandled fault: alignment fault (0x96000021) at 0x0000007f82635584
[  852.710739] Internal error: : 96000021 [#4] SMP
[  852.715235] Modules linked in: axi4s2dmov(O) ksgpio(O)
[  852.720358] CPU: 0 PID: 1870 Comm: a4s2dmov_send Tainted: G      D    O    4.4.0 #3
[  852.728001] Hardware name: ZynqMP ZCU102 RevB (DT)
[  852.732769] task: ffffffc0718ac180 ti: ffffffc0718b8000 task.ti: ffffffc0718b8000
[  852.740248] PC is at __copy_from_user+0x8c/0x180
[  852.744836] LR is at copy_from_iter+0x70/0x24c
[  852.749261] pc : [<ffffffc00039210c>] lr : [<ffffffc0003a36a8>] pstate: 80000145
[  852.756644] sp : ffffffc0718bba40
[  852.759935] x29: ffffffc0718bba40 x28: ffffffc06a4bae00 
[  852.765228] x27: ffffffc0718ac820 x26: 000000000000000c 
[  852.770523] x25: 0000000000000014 x24: 0000000000000000 
[  852.775818] x23: ffffffc0718bbe08 x22: ffffffc0710eba38 
[  852.781112] x21: ffffffc0718bbde8 x20: 000000000000000c 
[  852.786407] x19: 000000000000000c x18: ffffffc000823020 
[  852.791702] x17: 0000000000000000 x16: 0000000000000000 
[  852.796997] x15: 0000000000000000 x14: 00000000c0a85f32 
[  852.802292] x13: 0000000000000000 x12: 0000000000000032 
[  852.807586] x11: 0000000000000014 x10: 0000000000000014 
[  852.812881] x9 : ffffffc0718bbcf8 x8 : 000000000000000c 
[  852.818176] x7 : ffffffc0718bbdf8 x6 : ffffffc0710eba2c 
[  852.823471] x5 : ffffffc0710eba38 x4 : 0000000000000000 
[  852.828766] x3 : 000000000000000c x2 : 000000000000000c 
[  852.834061] x1 : 0000007f82635584 x0 : ffffffc0710eba2c 
[  852.839355] 
[  852.840833] Process a4s2dmov_send (pid: 1870, stack limit = 0xffffffc0718b8020)
[  852.848134] Stack: (0xffffffc0718bba40 to 0xffffffc0718bc000)
[  852.853858] ba40: ffffffc0718bba90 ffffffc0006a1b2c 000000000000000c ffffffc06a9bdb00
[  852.861676] ba60: 00000000000005dc ffffffc071a0d200 0000000000000000 ffffffc0718bbdf8
[  852.869488] ba80: 0000000000000014 ffffffc06a959000 ffffffc0718bbad0 ffffffc0006a2358
[...]
[  853.213212] Call trace:
[  853.215639] [<ffffffc00039210c>] __copy_from_user+0x8c/0x180
[  853.221284] [<ffffffc0006a1b2c>] ip_generic_getfrag+0xa4/0xc4
[  853.227011] [<ffffffc0006a2358>] __ip_append_data.isra.43+0x80c/0xa70
[  853.233434] [<ffffffc0006a3d50>] ip_make_skb+0xc4/0x148
[  853.238642] [<ffffffc0006c9d04>] udp_sendmsg+0x280/0x740
[  853.243937] [<ffffffc0006d38e4>] inet_sendmsg+0x7c/0xbc
[  853.249145] [<ffffffc000651f5c>] sock_sendmsg+0x18/0x2c
[  853.254352] [<ffffffc000654b14>] SyS_sendto+0xb0/0xf0
[  853.259388] [<ffffffc000084470>] el0_svc_naked+0x24/0x28
[  853.264682] Code: a88120c7 a8c12027 a88120c7 36180062 (f8408423) 
[  853.270791] ---[ end trace 30e1cd8e2ccd56c5 ]---
Segmentation fault
root@Xilinx-ZCU102-2016_2:~#

奇怪的是,当我简单地从缓冲区读取单词时,并不会导致任何对齐错误.

The strange thing is, that when I simply read words from the buffer, it does not cause any alignment errors.

看来,send 函数错误地使用了 __copy_from_user 函数,导致内存访问未对齐.问题是:是内核错误,还是我做错了什么?

It seems, that the send function improperly uses the __copy_from_user function, causing unaligned memory access. The question is: is it the kernel bug, or have I done something incorrectly?

但是,通常情况下,发送不是从 8 字节边界开始的数据块不会触发对齐错误.出现问题的概率相对较低.我无法隔离导致错误的条件.

However, usually, sending the data block not starting at 8-byte boundary does not trigger the alignment error. The problem occurs with relatively low probability. I was not able to isolate the conditions that lead to the error.

我已经通过调整 MAX_DGRAM 解决了这个问题,使其成为 8 的倍数.但是我担心,如果将 mmapped 缓冲区中的数据提交给更复杂的处理,问题可能会再次出现.有些人报告了 arm64 架构中与 memcpy 函数相关的类似问题(例如 [https:///bugs.launchpad.net/linux-linaro/+bug/1271649]).

I have worked around the problem by adjusting the MAX_DGRAM so that it is a multiple of 8. However I'm afraid, that the problem may reappear if the data in the mmapped buffer are submitted to more complex processing. Some people reported similar problems in arm64 architecture related to memcpy function (e.g. [https://bugs.launchpad.net/linux-linaro/+bug/1271649]).

将一致的 DMA 缓冲区映射到用户空间以避免内存对齐错误的正确方法是什么?

What is the correct method for mapping of coherent DMA buffers to user space to avoid memory alignment errors?

推荐答案

该驱动程序需要更新.ARCH_HAS_DMA_MMAP_COHERENT 很长一段时间都没有被 PowerPC 以外的任何东西定义过,即使这样看起来也像是被遗忘的遗留物.

That driver needs updating. ARCH_HAS_DMA_MMAP_COHERENT hasn't been defined by anything other than PowerPC for a long time, and even that looks like a forgotten leftover.

有一个通用的 dma_mmap_coherent() 实现 自 3.6 起,因此可以而且应该无条件地使用它.当前代码的结果是,多亏了#ifdef,你总是走另一条路,然后多亏了 pgprot_noncached() 你最终使缓冲区的用户空间映射强序(AArch64 术语中的设备 nGnRnE).这通常是一个坏主意,因为用户空间代码会假设它总是在普通内存上运行(除非明确设计不这样做),并且可以安全地执行诸如未对齐或独占访问之类的事情,这两种访问都可能在设备类型内存上出现严重错误.我什至不会问内核将数据从内核缓冲区的用户空间映射*内核 - 通过 copy_{to,from,in}_user() - 还假设用户空间地址被映射为普通内存,因此对于未对齐的访问是安全的.坦率地说,我有点惊讶这在 32 位 ARM 上不会类似地爆炸,所以我猜你的数据碰巧总是至少 4 字节对齐 - 这也可以解释为什么读取单词(使用 32 位访问)很好,如果只有 64 位双字访问可能会错位.

There has been a generic dma_mmap_coherent() implementation since 3.6, so that can, and should, be used unconditionally. The result of the current code is that, thanks to the #ifdef, you always take the other path, then thanks to pgprot_noncached() you end up making the userspace mapping of the buffer Strongly-ordered (Device nGnRnE in AArch64 terms). That's generally a bad idea, as userspace code will assume it's always operating on Normal memory (unless explicitly crafted not to), and can safely do things like unaligned or exclusive accesses, both of which are liable to go badly wrong on Device-type memory. I'm not even going to ask what kind of craziness ends up with the kernel copying data back out of a userspace mapping of a kernel buffer*, but suffice to say the kernel - via copy_{to,from,in}_user() - also assumes userspace addresses are mapped as Normal memory and thus safe for unaligned accesses. Frankly I'm a little surprised this doesn't blow up similarly on 32-bit ARM, so I guess your data happens to always be at least 4-byte aligned - that would also explain why reading words (with 32-bit accesses) is fine, if only 64-bit doubleword accesses can potentially be misaligned.

简而言之,只需使用dma_mmap_coherent(),并摆脱开放编码的糟糕等价物.这将为用户空间提供一个正常的不可缓存映射(或一个用于硬件一致性设备的可缓存映射),它将按预期工作.假设 dma_addr_t 是一个物理地址,它也没有被破坏,正如您的驱动程序代码似乎所做的那样 - 这是另一件迟早会出现并让您陷入困境的事情(ZynqMP 有一个系统 MMU,因此您大概可以更新到 4.9 内核,连接一些流 ID,将它们添加到 DT,然后观察这个假设以新的和令人兴奋的方式发生.

In short, just use dma_mmap_coherent(), and get rid of the open-coded poor equivalent. That will give userspace a Normal non-cacheable mapping (or a cacheable one for a hardware-coherent device) which will work as expected. It's also not broken in terms of assuming a dma_addr_t is a physical address, as your driver code seems to do - that's another thing that's liable to come around and bite you in the bum sooner or later (ZynqMP has a System MMU, so you can presumably update to a 4.9 kernel, wire up some Stream IDs, add them to the DT, and watch that assumption go bang in new and exciting ways).

* 虽然我确实想到在某些情况下从页面的最后复制有时可能会过度阅读到下一页,如果下一页碰巧会在不知不觉中触发这种情况设备/强序映射,导致 4.5 中的这个补丁.Linus 对这种内存布局的回应是 "...没有人真正做到这一点......"

* Although it does occur to me that there was some circumstance under which copying from the very end of a page may sometimes over-read into the next page, which could trigger this unwittingly if the following page happened to be a Device/Strongly-ordered mapping, which led to this patch in 4.5. Linus' response to such memory layouts was "...and nobody sane actually does that..."

这篇关于arm64上的Linux:sendto导致“未处理的错误:对齐错误(0x96000021)"当从 mmapped 相干 DMA 缓冲区发送数据时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆