上证所有效值计算 [英] SSE rms calculation

查看:213
本文介绍了上证所有效值计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要计算与英特尔的均方根上证所内在。
像这样的:

I want to calculation the rms with the Intel sse intrinsic. Like this:

float rms( float *a, float *b , int l)
{
    int n=0;
    float r=0.0;
    for(int i=0;i<l;i++)
    {
        if(finitef(a[i]) && finitef(b[i]))
        {
            n++;
            tmp = a[i] - b[i];
            r += tmp*tmp;
        }
    }
    r /= n;
    return r;
}

但如何检查哪些元素是喃?而如何计数N?

But how to check which elements are NaN? And how to count n?

推荐答案

您可以通过值与自身的比较测试NaN的值。 X == X 将返回false,如果x为NaN。因此,对于4×浮点值的SSE向量,VX:

You can test a value for NaN by comparing the value with itself. x == x will return false if x is a NaN. So for a SSE vector of 4 x float values, vx:

    vmask = _mm_cmpeq_ps(vx, vx);

会给你在VX NaN的元素和全1非NaN的元素全部为0口罩载体。您可以使用面膜来零出的NaN。还可以使用掩模通过将其作为32位的整数的矢量和累积计数有效数据点的数量。

will give you a mask vector with all 0s for NaN elements in vx and all 1s for non-NaN elements. You can use the mask to zero out the NaNs. You can also use the mask to count the number of valid data points by treating it as a vector of 32 bit ints and accumulating.

下面是一个工作,测试例子 - 注意,假定n是4的倍数,即a,b是不是16字节对齐,并且还要注意,它需要的SSE4

Here is a working, tested example - note that it assumes n is a multiple of 4, that a, b are not 16 byte aligned, and note also that it requires SSE4.

float rms(const float *a, const float *b , int n)
{
    int count;
    float sum;
    __m128i vcount = _mm_set1_epi32(0);
    __m128 vsum = _mm_set1_ps(0.0f);
    assert((n & 3) == 0);
    for (int i = 0; i < n; i += 4)
    {
        __m128 va = _mm_loadu_ps(&a[i]);
        __m128 vb = _mm_loadu_ps(&b[i]);
        __m128 vmaska = _mm_cmpeq_ps(va, va);
        __m128 vmaskb = _mm_cmpeq_ps(vb, vb);
        __m128 vmask = _mm_and_ps(vmaska, vmaskb);
        __m128 vtmp = _mm_sub_ps(va, vb);
        vtmp = _mm_and_ps(vtmp, vmask);
        vtmp = _mm_mul_ps(vtmp, vtmp);
        vsum = _mm_add_ps(vsum, vtmp);
        vcount = _mm_sub_epi32(vcount, (__m128i)vmask);
    }
    vsum = _mm_hadd_ps(vsum, vsum);
    vsum = _mm_hadd_ps(vsum, vsum);
    _mm_store_ss(&sum, vsum);
    vcount = _mm_hadd_epi32(vcount, vcount);
    vcount = _mm_hadd_epi32(vcount, vcount);
    count = _mm_extract_epi32(vcount, 0);
    return count > 0 ? sum / (float)count : 0.0f;
}

这篇关于上证所有效值计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆