Linux在arm64上:sendto导致“未处理的错误:对齐错误(0x96000021)".从映射的相干DMA缓冲区发送数据时 [英] Linux on arm64: sendto causes "Unhandled fault: alignment fault (0x96000021)" when sending data from mmapped coherent DMA buffer

查看:476
本文介绍了Linux在arm64上:sendto导致“未处理的错误:对齐错误(0x96000021)".从映射的相干DMA缓冲区发送数据时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个基于配备了arm64 CPU的UltraScale + FPGA的数据采集系统. 数据通过DMA传输到RAM.驱动程序中的DMA缓冲区保留如下:

I'm building a data acquisition system based on the UltraScale+ FPGA equipped with arm64 CPU. The data are transmitted to RAM via DMA. The DMA buffers in the driver are reserved as below:

virt_buf[i] = dma_zalloc_coherent(&pdev->dev, BUF_SIZE, &phys_buf[i],GFP_KERNEL);

在驱动程序的mmap函数中,到用户空间的映射是通过以下方式完成的:

In the driver's mmap function, the mapping to the user space is done in the following way:

#ifdef ARCH_HAS_DMA_MMAP_COHERENT
   printk(KERN_INFO "Mapping with dma_map_coherent DMA buffer at phys: %p virt %p\n",phys_buf[off],virt_buf[off]);
   res = dma_mmap_coherent(&my_pdev->dev, vma, virt_buf[off], phys_buf[off],  vsize);
#else
   physical = phys_buf[off];
   res=remap_pfn_range(vma,vma->vm_start, physical >> PAGE_SHIFT , vsize, pgprot_noncached(vma->vm_page_prot));
   printk(KERN_INFO "Mapping with remap_pfn_range DMA buffer at phys: %p virt %p\n",physical,virt_buf[off]);
#endif

在我的UltraScale + CPU上,使用remap_pfn_range. 在用户空间应用程序中,数据是从缓冲区中读取的,并且当前会立即以长度限制为MAX_DGRAM(最初等于572)的UDP数据包发送.

On my UltraScale+ CPU remap_pfn_range is used. In the user space application the data are read from the buffer, and currently immediately send in UDP packets with length limited to MAX_DGRAM (originally equal to 572).

 int i = 0;
 int bleft = nbytes;
 while(i<nbytes) {
    int bts = bleft < MAX_DGRAM ? bleft : MAX_DGRAM;
    if (sendto(fd,&buf[nbuf][i],bts,0, res2->ai_addr,res2->ai_addrlen)==-1) {
       printf("%s",strerror(errno));
       exit(1);
    }
    bleft -= bts;
   i+= bts;
 }

一切都在32位Zynq FPGA上完美运行.但是,将其移至64位UltraScale + FPGA后,经过数百次传输后,我开始收到随机错误.

Everything worked perfectly on the 32-bit Zynq FPGA. However, after I moved it to the 64-bit UltraScale+ FPGA, I started to receive random errors, after a few hundreds of transfers.

[  852.703491] Unhandled fault: alignment fault (0x96000021) at 0x0000007f82635584
[  852.710739] Internal error: : 96000021 [#4] SMP
[  852.715235] Modules linked in: axi4s2dmov(O) ksgpio(O)
[  852.720358] CPU: 0 PID: 1870 Comm: a4s2dmov_send Tainted: G      D    O    4.4.0 #3
[  852.728001] Hardware name: ZynqMP ZCU102 RevB (DT)
[  852.732769] task: ffffffc0718ac180 ti: ffffffc0718b8000 task.ti: ffffffc0718b8000
[  852.740248] PC is at __copy_from_user+0x8c/0x180
[  852.744836] LR is at copy_from_iter+0x70/0x24c
[  852.749261] pc : [<ffffffc00039210c>] lr : [<ffffffc0003a36a8>] pstate: 80000145
[  852.756644] sp : ffffffc0718bba40
[  852.759935] x29: ffffffc0718bba40 x28: ffffffc06a4bae00 
[  852.765228] x27: ffffffc0718ac820 x26: 000000000000000c 
[  852.770523] x25: 0000000000000014 x24: 0000000000000000 
[  852.775818] x23: ffffffc0718bbe08 x22: ffffffc0710eba38 
[  852.781112] x21: ffffffc0718bbde8 x20: 000000000000000c 
[  852.786407] x19: 000000000000000c x18: ffffffc000823020 
[  852.791702] x17: 0000000000000000 x16: 0000000000000000 
[  852.796997] x15: 0000000000000000 x14: 00000000c0a85f32 
[  852.802292] x13: 0000000000000000 x12: 0000000000000032 
[  852.807586] x11: 0000000000000014 x10: 0000000000000014 
[  852.812881] x9 : ffffffc0718bbcf8 x8 : 000000000000000c 
[  852.818176] x7 : ffffffc0718bbdf8 x6 : ffffffc0710eba2c 
[  852.823471] x5 : ffffffc0710eba38 x4 : 0000000000000000 
[  852.828766] x3 : 000000000000000c x2 : 000000000000000c 
[  852.834061] x1 : 0000007f82635584 x0 : ffffffc0710eba2c 
[  852.839355] 
[  852.840833] Process a4s2dmov_send (pid: 1870, stack limit = 0xffffffc0718b8020)
[  852.848134] Stack: (0xffffffc0718bba40 to 0xffffffc0718bc000)
[  852.853858] ba40: ffffffc0718bba90 ffffffc0006a1b2c 000000000000000c ffffffc06a9bdb00
[  852.861676] ba60: 00000000000005dc ffffffc071a0d200 0000000000000000 ffffffc0718bbdf8
[  852.869488] ba80: 0000000000000014 ffffffc06a959000 ffffffc0718bbad0 ffffffc0006a2358
[...]
[  853.213212] Call trace:
[  853.215639] [<ffffffc00039210c>] __copy_from_user+0x8c/0x180
[  853.221284] [<ffffffc0006a1b2c>] ip_generic_getfrag+0xa4/0xc4
[  853.227011] [<ffffffc0006a2358>] __ip_append_data.isra.43+0x80c/0xa70
[  853.233434] [<ffffffc0006a3d50>] ip_make_skb+0xc4/0x148
[  853.238642] [<ffffffc0006c9d04>] udp_sendmsg+0x280/0x740
[  853.243937] [<ffffffc0006d38e4>] inet_sendmsg+0x7c/0xbc
[  853.249145] [<ffffffc000651f5c>] sock_sendmsg+0x18/0x2c
[  853.254352] [<ffffffc000654b14>] SyS_sendto+0xb0/0xf0
[  853.259388] [<ffffffc000084470>] el0_svc_naked+0x24/0x28
[  853.264682] Code: a88120c7 a8c12027 a88120c7 36180062 (f8408423) 
[  853.270791] ---[ end trace 30e1cd8e2ccd56c5 ]---
Segmentation fault
root@Xilinx-ZCU102-2016_2:~#

奇怪的是,当我简单地从缓冲区中读取单词时,它不会引起任何对齐错误.

The strange thing is, that when I simply read words from the buffer, it does not cause any alignment errors.

发送功能似乎不正确地使用了 __ copy_from_user 功能,从而导致未对齐的内存访问.问题是:这是内核错误,还是我做错了什么?

It seems, that the send function improperly uses the __copy_from_user function, causing unaligned memory access. The question is: is it the kernel bug, or have I done something incorrectly?

但是,通常,发送不是从8字节边界开始的数据块不会触发对齐错误.发生此问题的可能性相对较低.我无法找出导致错误的条件.

However, usually, sending the data block not starting at 8-byte boundary does not trigger the alignment error. The problem occurs with relatively low probability. I was not able to isolate the conditions that lead to the error.

我通过调整MAX_DGRAM使其为8的倍数来解决此问题.但是,恐怕如果将mmapped缓冲区中的数据提交到更复杂的处理中,该问题可能会再次出现.有人报告了与memcpy函数相关的arm64体系结构中的类似问题(例如[ https://bugs.launchpad.net/linux-linaro/+bug/1271649] ).

I have worked around the problem by adjusting the MAX_DGRAM so that it is a multiple of 8. However I'm afraid, that the problem may reappear if the data in the mmapped buffer are submitted to more complex processing. Some people reported similar problems in arm64 architecture related to memcpy function (e.g. [https://bugs.launchpad.net/linux-linaro/+bug/1271649]).

将相干DMA缓冲区映射到用户空间以避免内存对齐错误的正确方法是什么?

What is the correct method for mapping of coherent DMA buffers to user space to avoid memory alignment errors?

推荐答案

该驱动程序需要更新. ARCH_HAS_DMA_MMAP_COHERENT很长时间以来,PowerPC都没有定义ARCH_HAS_DMA_MMAP_COHERENT,甚至看起来像是被遗忘的剩余物.

That driver needs updating. ARCH_HAS_DMA_MMAP_COHERENT hasn't been defined by anything other than PowerPC for a long time, and even that looks like a forgotten leftover.

已经有通用的dma_mmap_coherent()实现 pgprot_noncached() 最终导致缓冲区的用户空间映射强有序(以AArch64术语表示的设备nGnRnE).通常这是一个坏主意,因为用户空间代码将假定它始终在普通内存上运行(除非明确地设计为不这样做),并且可以安全地执行未对齐或独占访问之类的事情,这两种错误都可能在设备类型的内存上出现严重错误.我什至不问什么样的疯狂会导致内核将数据从内核缓冲区的用户空间映射 * 复制回去,但足以说明内核-通过copy_{to,from,in}_user()-还假定用户空间地址被映射为普通内存,因此对于未对齐的访问是安全的.坦率地说,我对这在32位ARM上无法正常工作感到惊讶,因此我猜您的数据总是至少4字节对齐-这也可以解释为什么读取单词(具有32位访问权限)如果只有64位双字访问可能会错位,那就没问题了.

There has been a generic dma_mmap_coherent() implementation since 3.6, so that can, and should, be used unconditionally. The result of the current code is that, thanks to the #ifdef, you always take the other path, then thanks to pgprot_noncached() you end up making the userspace mapping of the buffer Strongly-ordered (Device nGnRnE in AArch64 terms). That's generally a bad idea, as userspace code will assume it's always operating on Normal memory (unless explicitly crafted not to), and can safely do things like unaligned or exclusive accesses, both of which are liable to go badly wrong on Device-type memory. I'm not even going to ask what kind of craziness ends up with the kernel copying data back out of a userspace mapping of a kernel buffer*, but suffice to say the kernel - via copy_{to,from,in}_user() - also assumes userspace addresses are mapped as Normal memory and thus safe for unaligned accesses. Frankly I'm a little surprised this doesn't blow up similarly on 32-bit ARM, so I guess your data happens to always be at least 4-byte aligned - that would also explain why reading words (with 32-bit accesses) is fine, if only 64-bit doubleword accesses can potentially be misaligned.

简而言之,只需使用dma_mmap_coherent(),并摆脱开放式编码的等效标记.这将为用户空间提供一个正常的,不可缓存的映射(或用于硬件一致性设备的可缓存的映射),该映射将按预期工作.就假设dma_addr_t是物理地址(正如您的驱动程序代码所做的那样)而言,这也没有坏处-这又是一件事,它很可能迟早会引起您的误解(ZynqMP具有系统MMU,因此您大概可以更新到4.9内核,连接一些Stream ID,将其添加到DT中,并观察这种假设以新颖有趣的方式改变了.

In short, just use dma_mmap_coherent(), and get rid of the open-coded poor equivalent. That will give userspace a Normal non-cacheable mapping (or a cacheable one for a hardware-coherent device) which will work as expected. It's also not broken in terms of assuming a dma_addr_t is a physical address, as your driver code seems to do - that's another thing that's liable to come around and bite you in the bum sooner or later (ZynqMP has a System MMU, so you can presumably update to a 4.9 kernel, wire up some Stream IDs, add them to the DT, and watch that assumption go bang in new and exciting ways).

*尽管我确实发现在某些情况下从页面末尾进行复制有时可能会过度读入下一页,如果下一页碰巧是下一页,可能会不知不觉地触发此操作设备/严格排序的映射,导致"...没有人理智地做到这一点..."

* Although it does occur to me that there was some circumstance under which copying from the very end of a page may sometimes over-read into the next page, which could trigger this unwittingly if the following page happened to be a Device/Strongly-ordered mapping, which led to this patch in 4.5. Linus' response to such memory layouts was "...and nobody sane actually does that..."

这篇关于Linux在arm64上:sendto导致“未处理的错误:对齐错误(0x96000021)".从映射的相干DMA缓冲区发送数据时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆