高效的浮点比较(的Cortex-A8) [英] Efficient floating point comparison (Cortex-A8)

查看:497
本文介绍了高效的浮点比较(的Cortex-A8)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

浮动点变量的大(100〜000)阵列,并有一个阈值(也浮动点)。

问题是,我必须从具有阈值阵列中的每个一个变量比较,但是NEON标志传输需要很长的时间(约20个周期,根据一个分析器)。

有没有比较这些值的任何有效的方式?

注意:由于舍入误差不要紧的,我试过如下:

 浮动ARR [10000]
浮动阈值;
....int类型的ARR = [20]; //例如
INT T =阈值;
如果(T>一种){...}

但在这种情况下,我得到以下处理器的命令序列:

  vldr.32 S0,[R0]
vcvt.s32.f32 S0,S0
VMOV R0,S0< ---需要20个周期为'VMRS APSR_nzcv,fpscr`在的情况下,
CMP R0,R1浮点比较

由于转换发生在NEON,有没有问题,如果我比较整数,所描述的方式或浮动。


解决方案

如果彩车是32位IEEE-754和整型是32位太,如果没有正无穷大,负无穷大和 NaN的值,我们可以比较花车与一个小窍门整数:

 的#include<&stdio.h中GT;
#包括LT&;&limits.h中GT;
#包括LT&;&ASSERT.H GT;#定义C_ASSERT(表达式)的extern CHAR CAssertExtern [(表达式)?1:-1]
C_ASSERT(的sizeof(int)的==的sizeof(浮动));
C_ASSERT(的sizeof(int)的* CHAR_BIT == 32);INT isGreater(浮点* F1,浮动* F2)
{
  INT I1,I2,T1,T2;  I1 = *为(int *)F1;
  I2 = *为(int *)F2;  T1 = I1>> 31;
  I1 =(I1 ^ T1)+(T1&放大器; 0x80000001);  T2 = I2>> 31;
  I2 =(I2 ^ T2)+(T2&放大器; 0x80000001);  返回I1> I2;
}INT主要(无效)
{
  浮ARR [9] = {-3,-2,-1.5,-1,0,1,1.5,2,3};
  浮THR;
  INT I;  //确保彩车是32位IEE754和
  PTED为整数,因为我们希望// reinter $ P $ /预期
  {
    静态常量浮testf = 8873283.0f;
    无符号TESTI = *(无符号*)及testf;
    断言(TESTI == 0x4B076543);
  }  THR = -1.5;
  对于(i = 0; I< 9;我++)
  {
    输出(%F%S%F \\ N,改编[I]中,&下; = \\ 0>中+ 3 * isGreater(安培;常用3 [I],&放大器;苏氨酸),苏氨酸);
  }  THR = 1.5;
  对于(i = 0; I< 9;我++)
  {
    输出(%F%S%F \\ N,改编[I]中,&下; = \\ 0>中+ 3 * isGreater(安培;常用3 [I],&放大器;苏氨酸),苏氨酸);
  }  返回0;
}

输出:

  -3.000000< = -1.500000
-2.000000< = -1.500000
-1.500000< = -1.500000
-1.000000> -1.500000
0.000000> -1.500000
1.000000> -1.500000
1.500000> -1.500000
2.000000> -1.500000
3.000000> -1.500000
-3.000000< = 1.500000
-2.000000< = 1.500000
-1.500000< = 1.500000
-1.000000< = 1.500000
0.000000< = 1.500000
1.000000< = 1.500000
1.500000< = 1.500000
2.000000> 1.500000
3.000000> 1.500000

当然,是有意义的precalculate在 isGreater()就是在比较运营商使用,如果你的门槛并没有改变。<最后的整数值/ p>

如果你怕在C / C ++在上面code未定义行为,你可以重写code组装。

There is a big (~100 000) array of floating point variables, and there is a threshold (also floating point).

The problem is that I have to compare each one variable from the array with a threshold, but NEON flags transfer takes a really long time (~20 cycles in accordance to a profiler).

Is there any efficient way to compare these values?

NOTE: As rounding error doesn't matter, I tried the following:

float arr[10000];
float threshold; 
....

int a = arr[20]; // e.g.
int t = threshold;
if (t > a) {....}

But in this case I getting the following processor command sequence:

vldr.32        s0, [r0]
vcvt.s32.f32   s0, s0
vmov           r0, s0    <--- takes 20 cycles as `vmrs APSR_nzcv, fpscr` in case of 
cmp            r0, r1         floating point comparison

As conversion happens at NEON, there is no matter if I compare integers, by described way or floats.

解决方案

If floats are 32-bit IEEE-754 and ints are 32-bit too and if there are no +infinity, -infinity and NaN values, we can compare floats as ints with a little trick:

#include <stdio.h>
#include <limits.h>
#include <assert.h>

#define C_ASSERT(expr) extern char CAssertExtern[(expr)?1:-1]
C_ASSERT(sizeof(int) == sizeof(float));
C_ASSERT(sizeof(int) * CHAR_BIT == 32);

int isGreater(float* f1, float* f2)
{
  int i1, i2, t1, t2;

  i1 = *(int*)f1;
  i2 = *(int*)f2;

  t1 = i1 >> 31;
  i1 = (i1 ^ t1) + (t1 & 0x80000001);

  t2 = i2 >> 31;
  i2 = (i2 ^ t2) + (t2 & 0x80000001);

  return i1 > i2;
}

int main(void)
{
  float arr[9] = { -3, -2, -1.5, -1, 0, 1, 1.5, 2, 3 };
  float thr;
  int i;

  // Make sure floats are 32-bit IEE754 and
  // reinterpreted as integers as we want/expect
  {
    static const float testf = 8873283.0f;
    unsigned testi = *(unsigned*)&testf;
    assert(testi == 0x4B076543);
  }

  thr = -1.5;
  for (i = 0; i < 9; i++)
  {
    printf("%f %s %f\n", arr[i], "<=\0> " + 3*isGreater(&arr[i], &thr), thr);
  }

  thr = 1.5;
  for (i = 0; i < 9; i++)
  {
    printf("%f %s %f\n", arr[i], "<=\0> " + 3*isGreater(&arr[i], &thr), thr);
  }

  return 0;
}

Output:

-3.000000 <= -1.500000
-2.000000 <= -1.500000
-1.500000 <= -1.500000
-1.000000 >  -1.500000
0.000000 >  -1.500000
1.000000 >  -1.500000
1.500000 >  -1.500000
2.000000 >  -1.500000
3.000000 >  -1.500000
-3.000000 <= 1.500000
-2.000000 <= 1.500000
-1.500000 <= 1.500000
-1.000000 <= 1.500000
0.000000 <= 1.500000
1.000000 <= 1.500000
1.500000 <= 1.500000
2.000000 >  1.500000
3.000000 >  1.500000

Of course, it makes sense to precalculate that final integer value in isGreater() that's used in the comparison operator if your threshold doesn't change.

If you are afraid of undefined behavior in C/C++ in the above code, you can rewrite the code in assembly.

这篇关于高效的浮点比较(的Cortex-A8)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆