是否可以在Rust中编写Quake的快速InvSqrt()函数? [英] Is it possible to write Quake's fast InvSqrt() function in Rust?

查看:182
本文介绍了是否可以在Rust中编写Quake的快速InvSqrt()函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这只是为了满足我的好奇心.

This is just to satisfy my own curiosity.

是否有此实现?

float InvSqrt (float x)
{
   float xhalf = 0.5f*x;
   int i = *(int*)&x;
   i = 0x5f3759df - (i>>1);
   x = *(float*)&i;
   x = x*(1.5f - xhalf*x*x);
   return x;
}

在Rust中?如果存在,则发布代码.

in Rust? If it exists, post the code.

我尝试过但失败了.我不知道如何使用整数格式对浮点数进行编码.这是我的尝试:

I tried it and failed. I don't know how to encode the float number using integer format. Here is my attempt:

fn main() {
    println!("Hello, world!");
    println!("sqrt1: {}, ",sqrt2(100f64));
}

fn sqrt1(x: f64) -> f64 {
    x.sqrt()
}

fn sqrt2(x: f64) -> f64 {
    let mut x = x;
    let xhalf = 0.5*x;
    let mut i = x as i64;
    println!("sqrt1: {}, ", i);

    i = 0x5f375a86 as i64 - (i>>1);

    x = i as f64;
    x = x*(1.5f64 - xhalf*x*x);
    1.0/x
}

参考:
1. Quake3快速InvSqrt()的来源-第1页
2. 了解雷神之锤的快速反平方根
3. 快速反平方根.pdf
4. 源代码: q_math.c#L552-L572

Reference:
1. Origin of Quake3's Fast InvSqrt() - Page 1
2. Understanding Quake’s Fast Inverse Square Root
3. FAST INVERSE SQUARE ROOT.pdf
4. source code: q_math.c#L552-L572

推荐答案

我不知道如何使用整数格式对浮点数进行编码.

I don't know how to encode the float number using integer format.

有一个用于此的功能: f32::to_bits ,它返回一个u32.另一个方向也有此功能: f32::from_bits ,其中以u32作为参数.这些功能优于mem::transmute,因为后者是unsafe且使用起来很棘手.

There is a function for that: f32::to_bits which returns an u32. There is also the function for the other direction: f32::from_bits which takes an u32 as argument. These functions are preferred over mem::transmute as the latter is unsafe and tricky to use.

因此,这是InvSqrt的实现:

fn inv_sqrt(x: f32) -> f32 {
    let i = x.to_bits();
    let i = 0x5f3759df - (i >> 1);
    let y = f32::from_bits(i);

    y * (1.5 - 0.5 * x * y * y)
}

(游乐场)

此函数在x86-64上编译为以下程序集:

This function compiles to the following assembly on x86-64:

.LCPI0_0:
        .long   3204448256        ; f32 -0.5
.LCPI0_1:
        .long   1069547520        ; f32  1.5
example::inv_sqrt:
        movd    eax, xmm0
        shr     eax                   ; i << 1
        mov     ecx, 1597463007       ; 0x5f3759df
        sub     ecx, eax              ; 0x5f3759df - ...
        movd    xmm1, ecx
        mulss   xmm0, dword ptr [rip + .LCPI0_0]    ; x *= 0.5
        mulss   xmm0, xmm1                          ; x *= y
        mulss   xmm0, xmm1                          ; x *= y
        addss   xmm0, dword ptr [rip + .LCPI0_1]    ; x += 1.5
        mulss   xmm0, xmm1                          ; x *= y
        ret

我没有找到任何参考程序集(如果有的话,请告诉我!),但是对我来说似乎还不错.我只是不确定为什么将浮点数移到eax只是为了进行移位和整数减法.也许SSE寄存器不支持这些操作?

I have not found any reference assembly (if you have, please tell me!), but it seems fairly good to me. I am just not sure why the float was moved into eax just to do the shift and integer subtraction. Maybe SSE registers do not support those operations?

clang 9.0将C代码编译为基本上是同一程序集.这是一个好兆头.

clang 9.0 with -O3 compiles the C code to basically the same assembly. So that's a good sign.

值得指出的是,如果您实际上想在实践中使用它:请不要.正如benrg 在注释,现代的x86 CPU为此功能提供了专门的指令,该指令比该hack更快,更准确.不幸的是,1.0 / x.sqrt() 似乎并未对该指令进行优化.因此,如果您确实需要速度,请使用 _mm_rsqrt_ps内部函数可能是解决之道.但是,这再次需要unsafe代码.在这个答案中,我将不做详细介绍,因为少数程序员实际上会需要它.

It is worth pointing out that if you actually want to use this in practice: please don't. As benrg pointed out in the comments, modern x86 CPUs have a specialized instruction for this function which is faster and more accurate than this hack. Unfortunately, 1.0 / x.sqrt() does not seem to optimize to that instruction. So if you really need the speed, using the _mm_rsqrt_ps intrinsics is probably the way to go. This, however, does again require unsafe code. I won't go into much detail in this answer, as a minority of programmers will actually need it.

这篇关于是否可以在Rust中编写Quake的快速InvSqrt()函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆