是否可以在Rust中编写Quake的快速InvSqrt()函数? [英] Is it possible to write Quake's fast InvSqrt() function in Rust?
问题描述
这只是为了满足我的好奇心.
This is just to satisfy my own curiosity.
是否有此实现?
float InvSqrt (float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
在Rust中?如果存在,则发布代码.
in Rust? If it exists, post the code.
我尝试过但失败了.我不知道如何使用整数格式对浮点数进行编码.这是我的尝试:
I tried it and failed. I don't know how to encode the float number using integer format. Here is my attempt:
fn main() {
println!("Hello, world!");
println!("sqrt1: {}, ",sqrt2(100f64));
}
fn sqrt1(x: f64) -> f64 {
x.sqrt()
}
fn sqrt2(x: f64) -> f64 {
let mut x = x;
let xhalf = 0.5*x;
let mut i = x as i64;
println!("sqrt1: {}, ", i);
i = 0x5f375a86 as i64 - (i>>1);
x = i as f64;
x = x*(1.5f64 - xhalf*x*x);
1.0/x
}
参考:
1. Quake3快速InvSqrt()的来源-第1页
2. 了解雷神之锤的快速反平方根
3. 快速反平方根.pdf
4. 源代码: q_math.c#L552-L572
Reference:
1. Origin of Quake3's Fast InvSqrt() - Page 1
2. Understanding Quake’s Fast Inverse Square Root
3. FAST INVERSE SQUARE ROOT.pdf
4. source code: q_math.c#L552-L572
推荐答案
我不知道如何使用整数格式对浮点数进行编码.
I don't know how to encode the float number using integer format.
有一个用于此的功能: f32::to_bits
,它返回一个u32
.另一个方向也有此功能: f32::from_bits
,其中以u32
作为参数.这些功能优于mem::transmute
,因为后者是unsafe
且使用起来很棘手.
There is a function for that: f32::to_bits
which returns an u32
. There is also the function for the other direction: f32::from_bits
which takes an u32
as argument. These functions are preferred over mem::transmute
as the latter is unsafe
and tricky to use.
因此,这是InvSqrt
的实现:
fn inv_sqrt(x: f32) -> f32 {
let i = x.to_bits();
let i = 0x5f3759df - (i >> 1);
let y = f32::from_bits(i);
y * (1.5 - 0.5 * x * y * y)
}
(游乐场)
此函数在x86-64上编译为以下程序集:
This function compiles to the following assembly on x86-64:
.LCPI0_0:
.long 3204448256 ; f32 -0.5
.LCPI0_1:
.long 1069547520 ; f32 1.5
example::inv_sqrt:
movd eax, xmm0
shr eax ; i << 1
mov ecx, 1597463007 ; 0x5f3759df
sub ecx, eax ; 0x5f3759df - ...
movd xmm1, ecx
mulss xmm0, dword ptr [rip + .LCPI0_0] ; x *= 0.5
mulss xmm0, xmm1 ; x *= y
mulss xmm0, xmm1 ; x *= y
addss xmm0, dword ptr [rip + .LCPI0_1] ; x += 1.5
mulss xmm0, xmm1 ; x *= y
ret
我没有找到任何参考程序集(如果有的话,请告诉我!),但是对我来说似乎还不错.我只是不确定为什么将浮点数移到eax
只是为了进行移位和整数减法.也许SSE寄存器不支持这些操作?
I have not found any reference assembly (if you have, please tell me!), but it seems fairly good to me. I am just not sure why the float was moved into eax
just to do the shift and integer subtraction. Maybe SSE registers do not support those operations?
clang 9.0将C代码编译为基本上是同一程序集.这是一个好兆头.
clang 9.0 with -O3
compiles the C code to basically the same assembly. So that's a good sign.
值得指出的是,如果您实际上想在实践中使用它:请不要.正如benrg 在注释,现代的x86 CPU为此功能提供了专门的指令,该指令比该hack更快,更准确.不幸的是,1.0 / x.sqrt()
似乎并未对该指令进行优化.因此,如果您确实需要速度,请使用 _mm_rsqrt_ps
内部函数可能是解决之道.但是,这再次需要unsafe
代码.在这个答案中,我将不做详细介绍,因为少数程序员实际上会需要它.
It is worth pointing out that if you actually want to use this in practice: please don't. As benrg pointed out in the comments, modern x86 CPUs have a specialized instruction for this function which is faster and more accurate than this hack. Unfortunately, 1.0 / x.sqrt()
does not seem to optimize to that instruction. So if you really need the speed, using the _mm_rsqrt_ps
intrinsics is probably the way to go. This, however, does again require unsafe
code. I won't go into much detail in this answer, as a minority of programmers will actually need it.
这篇关于是否可以在Rust中编写Quake的快速InvSqrt()函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!