在iPhone上快速反平方根 [英] Fast inverse square root on the iPhone

查看:287
本文介绍了在iPhone上快速反平方根的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SGI / 3dfx使用的快速反平方函数,特别是在Quake中,经常被引用为比汇编指令等效的速度更快,但是声称看起来相当过时的帖子。我很好奇它在更现代化的硬件上的性能,特别是在像iPhone这样的移动设备上。如果Quake sqrt不再是桌面系统上的一个有价值的优化,我不会感到惊讶,但是对于一个涉及大量3D数学的iPhone项目,该如何呢?是否值得包含?

解决方案

否。



< NEON指令集(与所有其他矢量ISA *类似)具有硬件近似倒数平方根指令,比那些被引用的技巧要快得多。如果倒数平方根实际上是代码中的性能瓶颈,那么使用它(如往常一样,首先是基准;如果没有确凿的证据表明它的性能很重要,则不要花费时间优化某些东西)。

您可以通过使用 vrsqrte.f32 指令或C,Objective-C编写自己的程序集(内联或其他方式)通过包含< arm_neon.h> 标题并使用 vrsqrte_f32() b
$在SSE上,它是 rsqrtss / rsqrtps ;在Altivec上它是 frsqrte / vrsqrte


The fast inverse square function used by SGI/3dfx and most notably in Quake is often cited as being faster than the assembly instruction equivalent, however the posts claiming that seem quite dated. I was curious about its performance on more modern hardware, and particularly on mobile devices like the iPhone. I wouldn't be surprised if the Quake sqrt is not longer a worthwhile optimization on desktop systems, but how about for an iPhone project involving a lot of 3D math? Is it something that would be worthwhile to include?

解决方案

No.

The NEON instruction set (like every other vector ISA*) has a hardware approximate reciprocal square root instruction that is much faster than that oft-cited "trick". Use it instead if reciprocal square root is actually a performance bottleneck in your code (as always, benchmark first; don't spend time optimizing something if you have no hard evidence that its performance matters).

You can get at it by writing your own assembly (inline or otherwise) with the vrsqrte.f32 instruction, or from C, Objective-C, or C++ by including the <arm_neon.h> header and using the vrsqrte_f32( ) intrinsic.

[*] On SSE it's rsqrtss/rsqrtps; on Altivec it's frsqrte/vrsqrte.

这篇关于在iPhone上快速反平方根的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆