ARM NEON 矢量化失败 [英] ARM NEON vectorization failure

查看:33
本文介绍了ARM NEON 矢量化失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我的 ARM cortex-a9 上启用 NEON 向量化,但我在编译时得到了这个输出:

I would like to enable NEON vectorization on my ARM cortex-a9, but I get this output at compile:

未矢量化:不支持相关 stmt:D.14140_82 = D.14143_77 * D.14141_81"

这是我的循环:

void my_mul(float32_t * __restrict data1, float32_t * __restrict data2, float32_t * __restrict out){    
    for(int i=0; i<SIZE*4; i+=1){
        out[i] = data1[i]*data2[i];
    }
}

以及编译时使用的选项:

And the options used at compile:

-march=armv7-a -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -mvectorize-with-neon-quad -ftree-vectorizer-verbose=2

我正在使用 arm-linux-gnueabi (v4.6) 编译器.

需要注意的是,这个问题只出现在 float32 向量上.如果我切换到 int32,则矢量化完成.可能 float32 的向量化还没有实现……

It is important to note that the problem only appears with float32 vectors. If I switch in int32, then the vectorization is done. Maybe the vectorization for float32 is not yet available…

有人有想法吗?我是否忘记了 cmd 行或我的实现中的某些内容?

Does anyone has an idea ? Do I forget something in the cmd line or in my implementation ?

预先感谢您的帮助.

吉克斯

推荐答案

来自 GCC 的 ARM 选项页面

From GCC's ARM options page

一个>

-mfpu=名称

...

如果所选的浮点硬件包括 NEON 扩展(例如 -mfpu=`neon'),请注意 浮点运算不是由 GCC 的自动矢量化过程生成的,除非 -funsafe-math-optimizations 是还规定.这是因为 NEON 硬件没有完全实现浮点运算的 IEEE 754 标准(特别是非正规值被视为零),所以使用 NEON 指令可能会导致精度损失.

If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=`neon'), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.

如果您指定 -funsafe-math-optimizations 它应该可以工作,但如果您打算以高精度使用它,请重新阅读上面的注释.

If you specify -funsafe-math-optimizations it should work, but reread the note above if you are going to use this with high precision.

这篇关于ARM NEON 矢量化失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆