牛顿拉夫森与SSE2 - 有人可以给我解释一下这3个行 [英] Newton Raphson with SSE2 - can someone explain me these 3 lines

查看:126
本文介绍了牛顿拉夫森与SSE2 - 有人可以给我解释一下这3个行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在阅​​读本文件:<一href=\"http://software.intel.com/en-us/articles/interactive-ray-tracing\">http://software.intel.com/en-us/articles/interactive-ray-tracing

和我偶然发现了这三行code的:


  

该SIMD版本已经相当快一点,但我们可以做的更好。
  英特尔增加了一个快速1 /开方(x)函数的SSE2指令集。
  唯一的缺点是,它的precision是有限的。我们需要
  precision,所以我们完善它采用牛顿Rhapson:


  __m128 NR = _mm_rsqrt_ps(X);
 __m128 MULS = _mm_mul_ps(_mm_mul_ps(X,NR),NR);
 结果= _mm_mul_ps(_mm_mul_ps(半,NR),_mm_sub_ps(三级,MULS));


  

这code假设名为半壁江山一个__m128变量的存在
  (四次0.5F)和可变三化(四次3.0F)。


我知道如何使用牛顿拉夫森来计算函数的零点,我知道如何使用它来计算一个数的平方根,但我看不出这是如何code执行它。

有人能解释一下吗?


解决方案

由于牛顿迭代,它应该是相当简单的,看看这个源$ C ​​$ C。

  __m128 NR = _mm_rsqrt_ps(X); //初始逼近y_0
 __m128 MULS = _mm_mul_ps(_mm_mul_ps(X,NR),NR); // MULS = X * NR * NR == X(y_n)^ 2
 结果= _mm_mul_ps(
               _mm_sub_ps(三级,MULS)//这是3.0 - 穆尔;
   / *乘以* / __mm_mul_ps(半,NR)// y_0 / 2或y_0 * 0.5
 );

和为precise,这个算法是逆平方根

请注意,此<一个href=\"http://stackoverflow.com/questions/1528727/why-is-sse-scalar-sqrtx-slower-than-rsqrtx-x/1528751#1528751\">still不完全给一个完全准确的结果。 RSQRTPS 与NR迭代给出了准确度,与近23位的24位与正确的舍入 sqrtps 最后一位。

精度有限是一个问题,如果你想<一个href=\"http://stackoverflow.com/questions/35885170/handling-zeroes-in-mm256-rsqrt-ps/35893242#35893242\">truncate结果为整数。 (INT)4.99999 4 。此外,如果使用提防 X == 0.0 情况的sqrt(x)的〜= X *的sqrt(x)的,因为 0 * +天道酬勤= NaN的

I'm reading this document: http://software.intel.com/en-us/articles/interactive-ray-tracing

and I stumbled upon these three lines of code:

The SIMD version is already quite a bit faster, but we can do better. Intel has added a fast 1/sqrt(x) function to the SSE2 instruction set. The only drawback is that its precision is limited. We need the precision, so we refine it using Newton-Rhapson:

 __m128 nr = _mm_rsqrt_ps( x ); 
 __m128 muls = _mm_mul_ps( _mm_mul_ps( x, nr ), nr ); 
 result = _mm_mul_ps( _mm_mul_ps( half, nr ), _mm_sub_ps( three, muls ) ); 

This code assumes the existence of a __m128 variable named 'half' (four times 0.5f) and a variable 'three' (four times 3.0f).

I know how to use Newton Raphson to calculate a function's zero and I know how to use it to calculate the square root of a number but I just can't see how this code performs it.

Can someone explain it to me please?

解决方案

Given the Newton iteration , it should be quite straight forward to see this in the source code.

 __m128 nr   = _mm_rsqrt_ps( x );                  // The initial approximation y_0
 __m128 muls = _mm_mul_ps( _mm_mul_ps( x, nr ), nr ); // muls = x*nr*nr == x(y_n)^2
 result = _mm_mul_ps(
               _mm_sub_ps( three, muls )    // this is 3.0 - mul;
   /*multiplied by */ __mm_mul_ps(half,nr)  // y_0 / 2 or y_0 * 0.5
 );

And to be precise, this algorithm is for the inverse square root.

Note that this still doesn't give fully a fully accurate result. rsqrtps with a NR iteration gives almost 23 bits of accuracy, vs. sqrtps's 24 bits with correct rounding for the last bit.

The limited accuracy is an issue if you want to truncate the result to integer. (int)4.99999 is 4. Also, watch out for the x == 0.0 case if using sqrt(x) ~= x * sqrt(x), because 0 * +Inf = NaN.

这篇关于牛顿拉夫森与SSE2 - 有人可以给我解释一下这3个行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆