内部函数或汇编代码哪个更好? [英] Which is better Intrinsics or assembly coding?

查看:92
本文介绍了内部函数或汇编代码哪个更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很困惑,哪个更好. 我知道两者都可以编写代码,但对于任何处理器而言,总的来说我都不会更好.请告诉我同样的原因.

I am in a confusion that which is better . I am aware of writing the code in both but I am not getting which is better in general for any processor .Please tell me the reason also for the same .

推荐答案

正如保罗在评论中所说,您的需求和期望将表明:

As Paul said in comments your needs and expectancy would indicate:

通常从内部函数开始,然后仅在需要时才转到asm 进一步优化.对于x86,PowerPC等,这很少 必要,但是ARM/Neon的编译器不是很好,您可能 如果您的代码足够充分,则必须诉诸汇编 性能关键

In general start with intrinsics and then only go to asm if you need further optimisation. For x86, PowerPC, et al, this is rarely necessary, but compilers for ARM/Neon are not so good, and you may well have to resort to assembly if your code is sufficiently performance-critical

内部编译器是大多数编译器的一部分,您可以使用它们来满足您的性能要求.本质比内联汇编或纯汇编要简单.如果要使用高级语言(例如C或C ++),建议不要使用内联汇编.以我的经验,ICC,GCC和Clang无法优化内联汇编,或者如果优化的话它很小.当您要为x86之类的特定体系结构编写代码并为不同的微体系结构重新编译时,内在性很好.正如彼得在评论中所说:

Intrinsics are apart of most compilers and you can use them to comply your performance requirements. Intrinsics are simpler than inline assembly or pure assembly. If you are going to use high-level languages such as C or C++ I suggest not to use inline assembly. In my experience ICC, GCC and Clang could not optimize inline assembly or if optimize it would be tiny. Intrinsics are good when you want to code for a specific architecture like x86, and recompile it for different micro-architecture. As Peter said in comments:

以便能够使用不同的-mtune=haswell重新编译代码,或者 -mtune=znver1选项

to be able to re-compile your code with different -mtune=haswell or -mtune=znver1 options

对于优化器来说,内部特性也是一个挑战,但与内联汇编不同.例如,如果您编译用C和Intrinsics语言编写的代码,则性能可能不会有所不同,但是,您启用了编译器优化.在我的测试中,大多数O3,禁用自动矢量的功能和O2对于本征函数都具有相同的性能,而标量代码中的相同方法则表现出完全不同的性能(不是自动矢量化,而是其他优化).在本文中,您可以看到对内联汇编和矩阵内在函数的评估-矩阵乘法.此外,内部函数不能移植,需要将来维护,等等.我发现了一个新界面声称与Intrinsics相比,它不会失去性能.

Intrinsics also are a challenge for optimizer but not like inline assembly. For example, If you compile codes written in C and Intrinsics, your performance might not be different, however, you enable compilers optimizations. In my tests, mostly O3, autovector disabled, and O2 gain the same performance for Intrinsics while the same approach in scalar codes show completly differenct performance (not auto vectorized, but other optimizations). In this paper you can see an evaluation for inline assembly and Intrinsic functions for matrix-matrix multiplication. In addition, Intrinsic functions are not portable, need future maintenance, etc. I have found a new interface that claim it doesn't lose performance compared to Intrinsics.

这篇关于内部函数或汇编代码哪个更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆