使用双打时 ARMv6 上的“总线错误" [英] 'Bus Error' on ARMv6 when working with doubles

查看:34
本文介绍了使用双打时 ARMv6 上的“总线错误"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为 ARMv6 创建一个 C++ 程序,该程序因总线错误而崩溃.使用 GDB 我已将问题追溯到以下代码

I'm creating a C++ program for ARMv6 which crashes with BUS ERROR. Using GDB I have traced the problem to the following code

double d = *(double*)pData; pData += sizeof(int64_t);  // char *pData

程序通过接收到的消息,必须使用上面的代码提取一些双精度值.收到的消息有几个字段,有些是双倍的,有些不是.

The program goes through a received message and has to extract some double values using the above code. The received message has several fields, some doubles some not.

在 x86 架构上这工作正常,但在 ARM 上我收到总线错误".所以,我怀疑我的问题是数据对齐——双字段必须与 ARM 架构的内存中的字边界对齐.

On x86 architectures this works fine, but on ARM I get the 'bus error'. So, I suspect my problem is alignment of data -- the double fields have to be aligned to word boundaries in memory on the ARM architecture.

我尝试了以下修复方法,但没有奏效(仍然出现错误):

I have tried the following as a fix, which did not work (still got the error):

int64_t i = *(int64_t*)pData;
double d = *((double*)&i);

以下工作(到目前为止):

The following worked (so far):

double d = 0;
memcpy(&d, pData, sizeof(double));

使用memcpy"是最好的方法吗?或者,还有更好的方法?

Is using 'memcpy' the best approach? Or, is there a better way?

就我而言,我无法控制缓冲区中数据的打包或消息中字段的顺序.

相关问题:std::atomic在 Armv7 (RPi2) 和对齐/总线错误

推荐答案

使用memcpy"是最好的方法吗?

Is using 'memcpy' the best approach?

一般来说,这是唯一正确的方法,除非您的目标是单个 ABI,其中没有类型需要大于 1 字节的对齐.

In general it's the only correct approach, unless you're targeting a single ABI in which no type requires greater than 1-byte alignment.

C++ 标准相当冗长,所以我将引用 C 标准来更简洁地表达同样的事情:

The C++ standard is rather verbose, so I'll quote the C standard expressing the same thing much more succinctly:

指向对象或不完整类型的指针可能会转换为指向不同对象或不完整类型的指针.如果结果指针没有正确对齐指向的类型,则行为未定义.

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

它是:未定义行为的永远存在的幽灵.即使是 x86 编译器也完全可以在您睡觉时闯入您的房子并在您的头发上擦果酱,而不是按照您期望的方式加载数据,如果它的 ABI 是这样的话.

There it is: that ever-present spectre of undefined behaviour. Even an x86 compiler is perfectly well allowed to break into your house and rub jam into your hair while you sleep instead of loading that data the way you expect, if its ABI says so.

不过,有一点需要注意,现代编译器往往足够聪明,正确性不一定以性能为代价.让我们充实示例代码:

One thing to note, though, is that modern compilers tend to be clever enough that correctness doesn't necessarily come at the cost of performance. Let's flesh out that example code:

#include <string.h>

double func(char *data) {
    double d;
    memcpy(&d, data, sizeof d);
    return d;
}

...然后把它扔给编译器:

...and throw it at a compiler:

$ clang -target arm -march=armv6 -mfpu=vfpv3 -mfloat-abi=hard -O1 -S test.c
...
func:                                   @ @func
        .fnstart
@ BB#0:
        push    {r4, r5, r11, lr}
        sub     sp, sp, #8
        mov     r2, r0
        ldrb    r1, [r0, #3]
        ldrb    r3, [r0, #2]
        ldrb    r12, [r0]
        ldrb    lr, [r0, #1]
        ldrb    r4, [r2, #4]!
        orr     r5, r3, r1, lsl #8
        ldrb    r3, [r2, #2]
        ldrb    r2, [r2, #3]
        ldrb    r0, [r0, #5]
        orr     r1, r12, lr, lsl #8
        orr     r2, r3, r2, lsl #8
        orr     r0, r4, r0, lsl #8
        orr     r1, r1, r5, lsl #16
        orr     r0, r0, r2, lsl #16
        str     r1, [sp]
        str     r0, [sp, #4]
        vpop    {d0}
        pop     {r4, r5, r11, pc}

好的,所以它使用字节memcpy 来保证安全;至少它是内联的.但是,如果 CPU 配置适当,ARMv6 至少确实支持未对齐的字和半字访问 - 让我们告诉编译器我们很酷:

OK, so it's playing things safe with a bytewise memcpy; at least it's inlined. But hey, ARMv6 does at least support unaligned word and halfword accesses if the CPU is configured appropriately - let's tell the compiler we're cool with that:

$ clang -target arm -march=armv6 -mfpu=vfpv3 -mfloat-abi=hard -O1 -S -munaligned-access test.c
...
func:                                   @ @func
        .fnstart
@ BB#0:
        sub     sp, sp, #8
        ldr     r1, [r0]
        ldr     r0, [r0, #4]
        str     r0, [sp, #4]
        str     r1, [sp]
        vpop    {d0}
        bx      lr

我们开始了,这大约是您可以通过整数字加载做的最好的事情.现在,如果我们为更新的东西编译它会怎样?

There we go, that's about the best you can do with just integer word loads. Now, what if we compile it for something a bit newer?

$ clang -target arm -march=armv7 -mfpu=neon-vfpv4 -mfloat-abi=hard -O1 -S test.c
...
func:                                   @ @func
        .fnstart
@ BB#0:
        vld1.8  {d0}, [r0]
        bx      lr

我可以保证,即使在可以工作"的机器上,也没有未定义的行为黑客会在不到一条指令的时间内正确加载未对齐的双精度值.请注意,NEON 是这里的关键参与者 - vld1 只需要基地址与元素大小对齐,因此对于 8 位元素,它永远不会未对齐.在更一般的情况下(例如,如果它是 long long 而不是 double),您可能仍然需要 -munaligned-access 来说服编译器和以前一样.

I can guarantee that, even on a machine where it would "work", no undefined-behaviour-hackery would correctly load that unaligned double in fewer than one instructions. Note that NEON is the key player here - vld1 only requires the base address to be aligned to the element size, so for 8-bit elements it can never be unaligned. In the more general case (say, if it were a long long instead of a double) you might still need -munaligned-access to convince the compiler as before.

为了比较,让我们看看每个人最喜欢的 1970 年代变种孙子计算器芯片的票价如何:

For comparison, let's just see how everyone's favourite mutant-grandchild-of-a-1970s-calculator-chip fares as well:

clang -O1 -S test.c
...
func:                                   # @func
# BB#0:
        movl    4(%esp), %eax
        fldl    (%eax)
        retl

是的,正确的代码看起来也是最好的代码.

Yup, the correct code still also looks like the best code.

这篇关于使用双打时 ARMv6 上的“总线错误"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆