使用AVX与NaN进行比较 [英] Comparison with NaN using AVX

查看:114
本文介绍了使用AVX与NaN进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Intel的AVX内在函数为BPSK创建快速解码器.我有一组表示为交错浮点数的复数,但是由于BPSK调制,只需要实数部分(或偶数索引的浮点数).当x < 0时,每个浮点数x都映射到0,如果x >= 0则映射到1.这是使用以下例程完成的:

static inline void
normalize_bpsk_constellation_points(int32_t *out, const complex_t *in, size_t num)
{
    static const __m256             _min_mask = _mm256_set1_ps(-1.0);
    static const __m256             _max_mask = _mm256_set1_ps(1.0);
    static const __m256             _mul_mask = _mm256_set1_ps(0.5);

    __m256                          res;
    __m256i                         int_res;

    size_t i;
    gr_complex                      temp;
    float                           real;

    for(i = 0; i < num; i += COMPLEX_PER_AVX_REG){
            res = _mm256_load_ps((float *)&in[i]);

            /* clamp them to avoid segmentation faults due to indexing */
            res = _mm256_max_ps(_min_mask, _mm256_min_ps(_max_mask, res));

            /* Scale accordingly for proper indexing -1->0, 1->1 */
            res = _mm256_add_ps(res, _max_mask);
            res = _mm256_mul_ps(res, _mul_mask);

            /* And then round to the nearest integer */
            res = _mm256_round_ps(res, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC);

            int_res = _mm256_cvtps_epi32(res);

            _mm256_store_si256((__m256i *) &out[2*i], int_res);
    }
}

首先,我将所有收到的浮点数限制在[-1, 1]范围内.然后,经过适当的缩放后,结果将四舍五入到最接近的整数.这会将0.5以上的所有浮点映射到1,将所有0.5以下的所有浮点映射到0.

如果输入浮点数是普通数字,则该过程运行正常.但是,由于先前阶段的某些情况,某些输入浮点型可能为NaN-NaN.在这种情况下,"NaN"数字会通过_mm256_max_ps()_mm256_min_ps()和所有其他AVX函数传播,从而导致-2147483648的整数映射,这当然会由于无效索引而导致我的程序崩溃.

是否有任何变通办法来避免此问题,或者至少使用AVXNaN设置为0?

解决方案

您可以通过以下简单的方法来开始,比较和屏蔽:(未测试)

res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US);
ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31);

或移位和异或:(也未经测试)

ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31);
ires = _mm256_xor_epi32(ires, _mm256_set1_epi32(1));

此版本还将关注NaN的符号(并忽略NaN强度).

没有AVX2的替代产品(未经测试)

res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US);
res = _mm256_and_ps(res, _mm256_set1_ps(1.0f));
ires = _mm256_cvtps_epi32(res);

I am trying to create a fast decoder for BPSK using the AVX intrinsics of Intel. I have a set of complex numbers that are represented as interleaved floats, but due to the BPSK modulation only the real part (or the even indexed floats) are needed. Every float x is mapped to 0, when x < 0 and to 1 if x >= 0. This is accomplished using the following routine:

static inline void
normalize_bpsk_constellation_points(int32_t *out, const complex_t *in, size_t num)
{
    static const __m256             _min_mask = _mm256_set1_ps(-1.0);
    static const __m256             _max_mask = _mm256_set1_ps(1.0);
    static const __m256             _mul_mask = _mm256_set1_ps(0.5);

    __m256                          res;
    __m256i                         int_res;

    size_t i;
    gr_complex                      temp;
    float                           real;

    for(i = 0; i < num; i += COMPLEX_PER_AVX_REG){
            res = _mm256_load_ps((float *)&in[i]);

            /* clamp them to avoid segmentation faults due to indexing */
            res = _mm256_max_ps(_min_mask, _mm256_min_ps(_max_mask, res));

            /* Scale accordingly for proper indexing -1->0, 1->1 */
            res = _mm256_add_ps(res, _max_mask);
            res = _mm256_mul_ps(res, _mul_mask);

            /* And then round to the nearest integer */
            res = _mm256_round_ps(res, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC);

            int_res = _mm256_cvtps_epi32(res);

            _mm256_store_si256((__m256i *) &out[2*i], int_res);
    }
}

Firstly, I clamp all the received floats in the range [-1, 1]. Then after some proper scaling, the result is rounded to the nearest integer. That will map all floats above 0.5 to 1 and all floats below 0.5 to 0.

The procedure works fine if the input floats are normal numbers. However, due to some situations at previous stages, there is a possibility that some input floats are NaN or -NaN. At this case, 'NaN' numbers are propagated through the _mm256_max_ps(), _mm256_min_ps() and all other AVX functions resulting to an integer mapping of -2147483648 which of course causes my program to crash due to invalid indexing.

Is there any workaround to avoid this problem, or at least set the NaN to 0 using AVX?

解决方案

You could do it the simple way to begin with, compare and mask: (not tested)

res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US);
ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31);

Or shift and xor: (also not tested)

ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31);
ires = _mm256_xor_epi32(ires, _mm256_set1_epi32(1));

This version will also care about the sign of NaN (and ignore the NaN-ness).

Alternative for no AVX2 (not tested)

res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US);
res = _mm256_and_ps(res, _mm256_set1_ps(1.0f));
ires = _mm256_cvtps_epi32(res);

这篇关于使用AVX与NaN进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆