GCC 为 ARM 上的未对齐浮点访问生成程序集 [英] GCC generated assembly for unaligned float access on ARM
问题描述
您好,我目前正在开发一个程序,我需要处理一个包含一系列可能未对齐(有时也是)的浮点数的数据 blob.我正在为 ARM cortex-a8 使用 gcc 4.6.2 进行编译.我对生成的汇编代码有疑问:
作为例子,我写了一个最小的例子:对于下面的测试代码
浮动对齐[2];float *unaligned = (float*)(((char*)aligned)+2);int main(int argc, char **argv){浮动 f = 未对齐 [0];返回 (int) f;}
编译器(gcc 4.6.2 - 优化 -O3)生成
00008634 <主>:8634:e30038ec movw r3,#2284;0x8ec8638:e3403001 movt r3,#1863c:e5933000 ldr r3,[r3]8640:edd37a00 vldr s15,[r3]8644: eefd7ae7 vcvt.s32.f32 s15, s158648: ee170a90 vmov r0, s15864c:e12fff1e bx lr
这里的编译器无法知道数据是否对齐,但它使用的 VLDR 需要对齐数据,否则程序将因总线错误而崩溃.
现在这是我的实际问题:这是否从编译器中得到正确,我需要注意我的 C++ 代码中的对齐,还是这是编译器中的错误?
我也可能会添加我当前的解决方法,该解决方法可以在访问值之前使用 gcc 进行复制.诀窍是定义一个仅包含带有 gcc 打包属性的浮点数的结构体,并通过结构体指针访问数据.代码片段:
struct FloatWrapper { float f;} __attribute__((打包));const FloatWrapper *x = reinterpret_cast(rawX.data());const FloatWrapper *y = reinterpret_cast(rawY.data());for (size_t i = 0; i < vertexCount; ++i) {顶点[i].x = x[i].f;顶点[i].y = y[i].f;}
正如您所指出的,ARM ARM A3.2.1
状态与 SCTLR.A
值无关,VLDR
生成对齐错误
.
我已经在 Cortex-A9 上测试了你的例子,我得到了
# float_align[1] + 停止(信号) float_align
但是,我也对 ARM Cortex-A8 TRM 4.2.1,声明
<块引用><块引用>如果未指定对齐限定符,且 A=1,则如果未与元素大小对齐,则采用对齐错误.
如果未指定对齐限定符且 A=0,则将其视为未对齐访问.
这可能是一个半生不熟的解释,因为 ARM ARM
提供了更多信息,并附有详细的指令表.
所以我认为答案是,您需要自己处理对齐,因为编译器无法找出您在所有情况下加载的地址,例如链接后地址可能可用等.
Hello I am currently working on a program where I need to process a data blob that contains a series of floats which could be unaligned (and also are sometimes). I am compiling with gcc 4.6.2 for an ARM cortex-a8. I have a question to the generated assembly code:
As example I wrote a minimal example: For the following test code
float aligned[2];
float *unaligned = (float*)(((char*)aligned)+2);
int main(int argc, char **argv)
{
float f = unaligned[0];
return (int)f;
}
the compiler (gcc 4.6.2 - with optimization -O3) produces
00008634 <main>:
8634: e30038ec movw r3, #2284 ; 0x8ec
8638: e3403001 movt r3, #1
863c: e5933000 ldr r3, [r3]
8640: edd37a00 vldr s15, [r3]
8644: eefd7ae7 vcvt.s32.f32 s15, s15
8648: ee170a90 vmov r0, s15
864c: e12fff1e bx lr
The compiler here cannot know if the data is aligned but never the less it uses VLDR which needs aligned data or the program will crash with a bus error.
Now here is my actual question: Is this correct from the compiler and I need to take care of alignment in my C++ code or is this a bug in the compiler?
I also might add my current workaround which works and brings gcc to make a copy before accessing the value. The trick is to define a struct which only contains a float with the gcc packed attribute and access the data via a struct pointer. Code snippet:
struct FloatWrapper { float f; } __attribute__((packed));
const FloatWrapper *x = reinterpret_cast<const FloatWrapper *>(rawX.data());
const FloatWrapper *y = reinterpret_cast<const FloatWrapper *>(rawY.data());
for (size_t i = 0; i < vertexCount; ++i) {
vertices[i].x = x[i].f;
vertices[i].y = y[i].f;
}
As you have pointed ARM ARM A3.2.1
states regardless of SCTLR.A
value, VLDR
generates Alignment fault
.
I've tested your example on an Cortex-A9 and I got
# float_align
[1] + Stopped (signal) float_align
However, I'm confused also by the ARM Cortex-A8 TRM 4.2.1, it states
If an alignment qualifier is not specified, and A=1, the alignment fault is taken if it is not aligned to element size.
If an alignment qualifier is not specified, and A=0, it is treated as unaligned access.
This is probably a half baked explanation, since ARM ARM
is giving more information with a detailed table on instructions.
So I think answer is, you need to take care of alignment yourself since compiler can't find out which addresses you are loading in all scenarios, like address might be available after linking etc.
这篇关于GCC 为 ARM 上的未对齐浮点访问生成程序集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!