ARM NEON矢量化失败 [英] ARM NEON vectorization failure

查看：270 发布时间：2016/5/29 14:31:14 compiler-construction arm vectorization neon

本文介绍了ARM NEON矢量化失败的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在我的ARM Cortex-A9使NEON矢量化，但我得到在编译的输出：

不是矢量：不支持相关的语句：D.14140_82 = D.14143_77 * D.14141_81

下面是我的循环：

 无效my_mul（float32_t * __restrict数据1，float32_t * __restrict数据2，float32_t * __restrict出来）{
    的for（int i = 0; I＆LT; SIZE * 4; I + = 1）{
        出[I] = DATA1 [I] *数据2 [I]
    }
}

和编译使用的选项：

  -march =的ARMv7-A -mcpu =的cortex-A9 -mfpu =霓虹灯-mfloat-ABI = softfp -ftree-矢量-mvectorize与 - 霓虹灯四-ftree，矢量化-verbose = 2

我使用的 ARM-Linux的gnueabi（V4.6）编译器

要注意的是，问题仅是 FLOAT32 向量出现是非常重要的。如果我在切换的 INT32 和矢量完成。也许对于FLOAT32矢量尚不可用...

有没有人有一个想法？难道我在cmd行或在我的实现忘记的事？

在此先感谢您的帮助。

Guix

解决方案

的ARM选项页

-mfpu =名称



...



如果选择的浮点硬件包括NEON扩展（如-mfpu =`霓虹灯'），注意浮点运算不被GCC的自动矢量通产生的，除非-funsafe - 数学是优化还指定。这是因为NEON硬件没有完全实现的IEEE 754标准浮点运算（尤其非正规值被当作零），所以使用的NEON指令可能导致precision的损失。


如果您指定 -funsafe-数学优化它应该工作，但重读上面的说明，如果你要高precision使用。
I would like to enable NEON vectorization on my ARM cortex-a9, but I get this output at compile:

"not vectorized: relevant stmt not supported: D.14140_82 = D.14143_77 * D.14141_81"

Here is my loop:
void my_mul(float32_t * __restrict data1, float32_t * __restrict data2, float32_t * __restrict out){    
    for(int i=0; i<SIZE*4; i+=1){
        out[i] = data1[i]*data2[i];
    }
}
And the options used at compile:
-march=armv7-a -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -mvectorize-with-neon-quad -ftree-vectorizer-verbose=2
I am using arm-linux-gnueabi (v4.6 ) compiler.

It is important to note that the problem only appears with float32 vectors. If I switch in int32, then the vectorization is done. Maybe the vectorization for float32 is not yet available…

Does anyone has an idea ? Do I forget something in the cmd line or in my implementation ?

Thanks in advance for your help.

Guix
解决方案
From GCC's ARM options page

-mfpu=name

...

If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=`neon'), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.

If you specify -funsafe-math-optimizations it should work, but reread the note above if you are going to use this with high precision.

这篇关于ARM NEON矢量化失败的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

ARM NEON矢量化失败 [英] ARM NEON vectorization failure

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

ARM NEON矢量化失败 [英] ARM NEON vectorization failure

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭