ARM NEON 矢量化失败 [英] ARM NEON vectorization failure
问题描述
我想在我的 ARM cortex-a9 上启用 NEON 向量化,但我在编译时得到了这个输出:
I would like to enable NEON vectorization on my ARM cortex-a9, but I get this output at compile:
未矢量化:不支持相关 stmt:D.14140_82 = D.14143_77 * D.14141_81"
这是我的循环:
void my_mul(float32_t * __restrict data1, float32_t * __restrict data2, float32_t * __restrict out){
for(int i=0; i<SIZE*4; i+=1){
out[i] = data1[i]*data2[i];
}
}
以及编译时使用的选项:
And the options used at compile:
-march=armv7-a -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -mvectorize-with-neon-quad -ftree-vectorizer-verbose=2
我正在使用 arm-linux-gnueabi (v4.6) 编译器.
需要注意的是,这个问题只出现在 float32 向量上.如果我切换到 int32,则矢量化完成.可能 float32 的向量化还没有实现……
It is important to note that the problem only appears with float32 vectors. If I switch in int32, then the vectorization is done. Maybe the vectorization for float32 is not yet available…
有人有想法吗?我是否忘记了 cmd 行或我的实现中的某些内容?
Does anyone has an idea ? Do I forget something in the cmd line or in my implementation ?
预先感谢您的帮助.
吉克斯
推荐答案
From GCC's ARM options page
一个>
-mfpu=名称
...
如果所选的浮点硬件包括 NEON 扩展(例如 -mfpu=`neon'),请注意 浮点运算不是由 GCC 的自动矢量化过程生成的,除非 -funsafe-math-optimizations 是还规定.这是因为 NEON 硬件没有完全实现浮点运算的 IEEE 754 标准(特别是非正规值被视为零),所以使用 NEON 指令可能会导致精度损失.
If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=`neon'), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.
如果您指定 -funsafe-math-optimizations
它应该可以工作,但如果您打算以高精度使用它,请重新阅读上面的注释.
If you specify -funsafe-math-optimizations
it should work, but reread the note above if you are going to use this with high precision.
这篇关于ARM NEON 矢量化失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!