使用 NEON 内在函数除以浮点数 [英] Divide by floating-point number using NEON intrinsics
问题描述
我当时正在以四个像素处理图像,这是在 Android 应用程序的 armv7
上.
I'm processing an image by four pixels at the time, this on a armv7
for an Android application.
我想将 float32x4_t
向量除以另一个向量,但其中的数字从大约 0.7
到 3.85
不等,看起来对我来说,除法的唯一方法是使用右移,但这是针对 2^n
的数字.
I want to divide a float32x4_t
vector by another vector but the numbers in it are varying from circa 0.7
to 3.85
, and it seems to me that the only way to divide is using right shift but that is for a number which is 2^n
.
另外,我是新来的,所以欢迎任何建设性的帮助或评论.
Also, I'm new in this, so any constructive help or comment is welcomed.
示例:
如何使用 NEON 内在函数执行这些操作?
How can I perform these operations with NEON intrinsics?
float32x4_t a = {25.3,34.1,11.0,25.1};
float32x4_t b = {1.2,3.5,2.5,2.0};
// somthing like this
float32x4 resultado = a/b; // {21.08,9.74,4.4,12.55}
推荐答案
NEON 指令集没有浮点除法.
The NEON instruction set does not have a floating-point divide.
如果您先验知道您的值不是很差的缩放,并且您不需要正确的舍入(如果您正在进行图像处理,这几乎肯定是这种情况),那么您可以使用相互估计、细化步骤和乘法而不是除法:
If you know a priori that your values are not poorly scaled, and you do not require correct rounding (this is almost certainly the case if you're doing image processing), then you can use a reciprocal estimate, refinement step, and multiply instead of a divide:
// get an initial estimate of 1/b.
float32x4_t reciprocal = vrecpeq_f32(b);
// use a couple Newton-Raphson steps to refine the estimate. Depending on your
// application's accuracy requirements, you may be able to get away with only
// one refinement (instead of the two used here). Be sure to test!
reciprocal = vmulq_f32(vrecpsq_f32(b, reciprocal), reciprocal);
reciprocal = vmulq_f32(vrecpsq_f32(b, reciprocal), reciprocal);
// and finally, compute a/b = a*(1/b)
float32x4_t result = vmulq_f32(a,reciprocal);
这篇关于使用 NEON 内在函数除以浮点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!