GCC 为 ARM 上的未对齐浮点访问生成程序集 [英] GCC generated assembly for unaligned float access on ARM

查看:21
本文介绍了GCC 为 ARM 上的未对齐浮点访问生成程序集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我目前正在开发一个程序,我需要处理一个包含一系列可能未对齐(有时也是)的浮点数的数据 blob.我正在为 ARM cortex-a8 使用 gcc 4.6.2 进行编译.我对生成的汇编代码有疑问:

作为例子,我写了一个最小的例子:对于下面的测试代码

浮动对齐[2];float *unaligned = (float*)(((char*)aligned)+2);int main(int argc, char **argv){浮动 f = 未对齐 [0];返回 (int) f;}

编译器(gcc 4.6.2 - 优化 -O3)生成

00008634 <主>:8634:e30038ec movw r3,#2284;0x8ec8638:e3403001 movt r3,#1863c:e5933000 ldr r3,[r3]8640:edd37a00 vldr s15,[r3]8644: eefd7ae7 vcvt.s32.f32 s15, s158648: ee170a90 vmov r0, s15864c:e12fff1e bx lr

这里的编译器无法知道数据是否对齐,但它使用的 VLDR 需要对齐数据,否则程序将因总线错误而崩溃.

现在这是我的实际问题:这是否从编译器中得到正确,我需要注意我的 C++ 代码中的对齐,还是这是编译器中的错误?

我也可能会添加我当前的解决方法,该解决方法可以在访问值之前使用 gcc 进行复制.诀窍是定义一个仅包含带有 gcc 打包属性的浮点数的结构体,并通过结构体指针访问数据.代码片段:

struct FloatWrapper { float f;} __attribute__((打包));const FloatWrapper *x = reinterpret_cast(rawX.data());const FloatWrapper *y = reinterpret_cast(rawY.data());for (size_t i = 0; i < vertexCount; ++i) {顶点[i].x = x[i].f;顶点[i].y = y[i].f;}

解决方案

正如您所指出的,ARM ARM A3.2.1 状态与 SCTLR.A 值无关,VLDR 生成对齐错误.

我已经在 Cortex-A9 上测试了你的例子,我得到了

# float_align[1] + 停止(信号) float_align

但是,我也对 ARM Cortex-A8 TRM 4.2.1,声明

<块引用><块引用>

如果未指定对齐限定符,且 A=1,则如果未与元素大小对齐,则采用对齐错误.

如果未指定对齐限定符且 A=0,则将其视为未对齐访问.

这可能是一个半生不熟的解释,因为 ARM ARM 提供了更多信息,并附有详细的指令表.

所以我认为答案是,您需要自己处理对齐,因为编译器无法找出您在所有情况下加载的地址,例如链接后地址可能可用等.

Hello I am currently working on a program where I need to process a data blob that contains a series of floats which could be unaligned (and also are sometimes). I am compiling with gcc 4.6.2 for an ARM cortex-a8. I have a question to the generated assembly code:

As example I wrote a minimal example: For the following test code

float aligned[2];
float *unaligned = (float*)(((char*)aligned)+2);

int main(int argc, char **argv) 
{
    float f = unaligned[0];  
    return (int)f;
}

the compiler (gcc 4.6.2 - with optimization -O3) produces

00008634 <main>:
    8634: e30038ec            movw         r3, #2284      ; 0x8ec
    8638: e3403001            movt         r3, #1
    863c: e5933000            ldr          r3, [r3]
    8640: edd37a00            vldr         s15, [r3]
    8644: eefd7ae7            vcvt.s32.f32 s15, s15
    8648: ee170a90            vmov         r0, s15
    864c: e12fff1e            bx           lr

The compiler here cannot know if the data is aligned but never the less it uses VLDR which needs aligned data or the program will crash with a bus error.

Now here is my actual question: Is this correct from the compiler and I need to take care of alignment in my C++ code or is this a bug in the compiler?

I also might add my current workaround which works and brings gcc to make a copy before accessing the value. The trick is to define a struct which only contains a float with the gcc packed attribute and access the data via a struct pointer. Code snippet:

struct FloatWrapper { float f; } __attribute__((packed));
const FloatWrapper *x = reinterpret_cast<const FloatWrapper *>(rawX.data());
const FloatWrapper *y = reinterpret_cast<const FloatWrapper *>(rawY.data());

for (size_t i = 0; i < vertexCount; ++i) {
    vertices[i].x = x[i].f;
    vertices[i].y = y[i].f;
}

解决方案

As you have pointed ARM ARM A3.2.1 states regardless of SCTLR.A value, VLDR generates Alignment fault.

I've tested your example on an Cortex-A9 and I got

# float_align                                                   
[1] + Stopped (signal)     float_align 

However, I'm confused also by the ARM Cortex-A8 TRM 4.2.1, it states

If an alignment qualifier is not specified, and A=1, the alignment fault is taken if it is not aligned to element size.

If an alignment qualifier is not specified, and A=0, it is treated as unaligned access.

This is probably a half baked explanation, since ARM ARM is giving more information with a detailed table on instructions.

So I think answer is, you need to take care of alignment yourself since compiler can't find out which addresses you are loading in all scenarios, like address might be available after linking etc.

这篇关于GCC 为 ARM 上的未对齐浮点访问生成程序集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆