使用双打时 ARMv6 上的“总线错误" [英] 'Bus Error' on ARMv6 when working with doubles
问题描述
我正在为 ARMv6 创建一个 C++ 程序,该程序因总线错误而崩溃.使用 GDB 我已将问题追溯到以下代码
I'm creating a C++ program for ARMv6 which crashes with BUS ERROR. Using GDB I have traced the problem to the following code
double d = *(double*)pData; pData += sizeof(int64_t); // char *pData
程序通过接收到的消息,必须使用上面的代码提取一些双精度值.收到的消息有几个字段,有些是双倍的,有些不是.
The program goes through a received message and has to extract some double values using the above code. The received message has several fields, some doubles some not.
在 x86 架构上这工作正常,但在 ARM 上我收到总线错误".所以,我怀疑我的问题是数据对齐——双字段必须与 ARM 架构的内存中的字边界对齐.
On x86 architectures this works fine, but on ARM I get the 'bus error'. So, I suspect my problem is alignment of data -- the double fields have to be aligned to word boundaries in memory on the ARM architecture.
我尝试了以下修复方法,但没有奏效(仍然出现错误):
I have tried the following as a fix, which did not work (still got the error):
int64_t i = *(int64_t*)pData;
double d = *((double*)&i);
以下工作(到目前为止):
The following worked (so far):
double d = 0;
memcpy(&d, pData, sizeof(double));
使用memcpy"是最好的方法吗?或者,还有更好的方法?
Is using 'memcpy' the best approach? Or, is there a better way?
就我而言,我无法控制缓冲区中数据的打包或消息中字段的顺序.
相关问题:std::atomic
推荐答案
使用memcpy"是最好的方法吗?
Is using 'memcpy' the best approach?
一般来说,这是唯一正确的方法,除非您的目标是单个 ABI,其中没有类型需要大于 1 字节的对齐.
In general it's the only correct approach, unless you're targeting a single ABI in which no type requires greater than 1-byte alignment.
C++ 标准相当冗长,所以我将引用 C 标准来更简洁地表达同样的事情:
The C++ standard is rather verbose, so I'll quote the C standard expressing the same thing much more succinctly:
指向对象或不完整类型的指针可能会转换为指向不同对象或不完整类型的指针.如果结果指针没有正确对齐指向的类型,则行为未定义.
A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.
它是:未定义行为的永远存在的幽灵.即使是 x86 编译器也完全可以在您睡觉时闯入您的房子并在您的头发上擦果酱,而不是按照您期望的方式加载数据,如果它的 ABI 是这样的话.
There it is: that ever-present spectre of undefined behaviour. Even an x86 compiler is perfectly well allowed to break into your house and rub jam into your hair while you sleep instead of loading that data the way you expect, if its ABI says so.
不过,有一点需要注意,现代编译器往往足够聪明,正确性不一定以性能为代价.让我们充实示例代码:
One thing to note, though, is that modern compilers tend to be clever enough that correctness doesn't necessarily come at the cost of performance. Let's flesh out that example code:
#include <string.h>
double func(char *data) {
double d;
memcpy(&d, data, sizeof d);
return d;
}
...然后把它扔给编译器:
...and throw it at a compiler:
$ clang -target arm -march=armv6 -mfpu=vfpv3 -mfloat-abi=hard -O1 -S test.c
...
func: @ @func
.fnstart
@ BB#0:
push {r4, r5, r11, lr}
sub sp, sp, #8
mov r2, r0
ldrb r1, [r0, #3]
ldrb r3, [r0, #2]
ldrb r12, [r0]
ldrb lr, [r0, #1]
ldrb r4, [r2, #4]!
orr r5, r3, r1, lsl #8
ldrb r3, [r2, #2]
ldrb r2, [r2, #3]
ldrb r0, [r0, #5]
orr r1, r12, lr, lsl #8
orr r2, r3, r2, lsl #8
orr r0, r4, r0, lsl #8
orr r1, r1, r5, lsl #16
orr r0, r0, r2, lsl #16
str r1, [sp]
str r0, [sp, #4]
vpop {d0}
pop {r4, r5, r11, pc}
好的,所以它使用字节memcpy
来保证安全;至少它是内联的.但是,如果 CPU 配置适当,ARMv6 至少确实支持未对齐的字和半字访问 - 让我们告诉编译器我们很酷:
OK, so it's playing things safe with a bytewise memcpy
; at least it's inlined. But hey, ARMv6 does at least support unaligned word and halfword accesses if the CPU is configured appropriately - let's tell the compiler we're cool with that:
$ clang -target arm -march=armv6 -mfpu=vfpv3 -mfloat-abi=hard -O1 -S -munaligned-access test.c
...
func: @ @func
.fnstart
@ BB#0:
sub sp, sp, #8
ldr r1, [r0]
ldr r0, [r0, #4]
str r0, [sp, #4]
str r1, [sp]
vpop {d0}
bx lr
我们开始了,这大约是您可以通过整数字加载做的最好的事情.现在,如果我们为更新的东西编译它会怎样?
There we go, that's about the best you can do with just integer word loads. Now, what if we compile it for something a bit newer?
$ clang -target arm -march=armv7 -mfpu=neon-vfpv4 -mfloat-abi=hard -O1 -S test.c
...
func: @ @func
.fnstart
@ BB#0:
vld1.8 {d0}, [r0]
bx lr
我可以保证,即使在可以工作"的机器上,也没有未定义的行为黑客会在不到一条指令的时间内正确加载未对齐的双精度值.请注意,NEON 是这里的关键参与者 - vld1
只需要基地址与元素大小对齐,因此对于 8 位元素,它永远不会未对齐.在更一般的情况下(例如,如果它是 long long
而不是 double
),您可能仍然需要 -munaligned-access
来说服编译器和以前一样.
I can guarantee that, even on a machine where it would "work", no undefined-behaviour-hackery would correctly load that unaligned double in fewer than one instructions. Note that NEON is the key player here - vld1
only requires the base address to be aligned to the element size, so for 8-bit elements it can never be unaligned. In the more general case (say, if it were a long long
instead of a double
) you might still need -munaligned-access
to convince the compiler as before.
为了比较,让我们看看每个人最喜欢的 1970 年代变种孙子计算器芯片的票价如何:
For comparison, let's just see how everyone's favourite mutant-grandchild-of-a-1970s-calculator-chip fares as well:
clang -O1 -S test.c
...
func: # @func
# BB#0:
movl 4(%esp), %eax
fldl (%eax)
retl
是的,正确的代码看起来也是最好的代码.
Yup, the correct code still also looks like the best code.
这篇关于使用双打时 ARMv6 上的“总线错误"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!