臂NEON内在VS手工装配 [英] Arm Neon Intrinsics vs hand assembly

查看:370
本文介绍了臂NEON内在VS手工装配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

http://hilbert-space.de/?p=22

在这个网站是非常过时它表明手写ASM将给予更大的改善则内部函数。我想知道如果这是当前真理即使是现在2012年。

On this site which is quite dated it shows that hand written asm would give a much greater improvement then the intrinsics. I am wondering if this is the current truth even now in 2012.

因此​​具有使用GNU交叉编译器内部函数提高编译优化?

So has the compilation optimization improved for intrinsics using gnu cross compiler ?

推荐答案

我的经验是,这种intrinsic还没有真正是值得的麻烦。这太容易为编译器注入你内在的额外的寄存器卸载/负载阶跃。得到它停止这样做的努力不仅仅是原始NEON写的东西更复杂。我见过这样的东西在pretty最近的编译器(包括铛3.1)。

My experience is that the intrinsics haven't really been worth the trouble. It's too easy for the compiler to inject extra register unload/load steps between your intrinsics. The effort to get it to stop doing that is more complicated than just writing the stuff in raw NEON. I've seen this kind of stuff in pretty recent compilers (including clang 3.1).

在这个层面上,我觉得你真的需要准确控制发生了什么。你可以有各种摊位,如果你刚好错误的顺序做的事情。在内部函数这样做,感觉就像手术对焊工手套。如果code是如此性能的关键,我需要内部函数的话,那么内部函数都不够好。也许别人在这里有不同的体验。

At this level, I find you really need to control exactly what's happening. You can have all kinds of stalls if you do things in just barely the wrong order. Doing it in intrinsics feels like surgery with welder's gloves on. If the code is so performance critical that I need intrinsics at all, then intrinsics aren't good enough. Maybe others have difference experiences here.

这篇关于臂NEON内在VS手工装配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆