AVX2浮点比较并获得0.0或1.0而不是全0或全1位 [英] AVX2 float compare and get 0.0 or 1.0 instead of all-0 or all-one bits

查看:48
本文介绍了AVX2浮点比较并获得0.0或1.0而不是全0或全1位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,在结果矢量中,我想为所有输入浮点值> 1保留1.0,而为所有输入浮点值< = 1保留0.0.这是我的代码,

Basically, in the resulting vector, I want to save 1.0 for all input floating point values > 1, while 0.0 for all input floating point values <= 1. Here is my code,

float f[8] = {1.2, 0.5, 1.7, 1.9, 0.34, 22.9, 18.6, 0.7};
float r[8]; // Must be {1, 0, 1, 1, 0, 1, 1, 0}

__m256i tmp1 = _mm256_cvttps_epi32(_mm256_loadu_ps(f));
__m256i tmp2 = _mm256_cmpgt_epi32(tmp1, _mm256_set1_epi32(1));
_mm256_store_ps(r, _mm256_cvtepi32_ps(tmp2));

for(int i = 0; i < 8; i++)
    std::cout << f[i] << " : " << r[i] << std::endl;

但是我没有得到正确的结果.这就是我得到的.为什么AVX2关系操作对我来说不能正常工作?

But I don't get the correct results. This is what I get. Why aren't AVX2 relational operations working properly for me?

1.2 : 0
0.5 : 0
1.7 : 0
1.9 : 0
0.34 : 0
22.9 : -1
18.6 : -1
0.7 : 0

推荐答案

我认为最好将 _mm256_cmp_ps 用于您的问题.为此,我已经实现了以下程序.这不仅仅是您想要的.如果要保存,则应将所有 mask 元素设置为 1 ,但是如果要保存其他数字,则可以将掩码值更改为所需的值.

I think it's better to use _mm256_cmp_ps for your question. I have implemented the following program for this purpose. This is more than what you want. If you want to save ones you should set all mask elements to 1, but if you want to save another number you can change the mask value to whatever you want.

//gcc 6.2, Linux-mint, Skylake 
#include <stdio.h>
#include <x86intrin.h>

float __attribute__(( aligned(32))) f[8] = {1.2, 0.5, 1.7, 1.9, 0.34, 22.9, 18.6, 1.0};
// float __attribute__(( aligned(32))) r[8]; // Must be {1, 0, 1, 1, 0, 1, 1, 0}
// in C++11, use alignas(32).  Or C11 _Alignas(32), instead of GNU C __attribute__.

void printVecps(__m256 vec)
{
    float tempps[8];
    _mm256_store_ps(&tempps[0], vec);
    printf(" [0]=%3.2f, [1]=%3.2f, [2]=%3.2f, [3]=%3.2f, [4]=%3.2f, [5]=%3.2f, [6]=%3.2f, [7]=%3.2f \n",
    tempps[0],tempps[1],tempps[2],tempps[3],tempps[4],tempps[5],tempps[6],tempps[7]) ;

}

int main()
{

    __m256 mask = _mm256_set1_ps(1.0), vec1, vec2, vec3;

    vec1 = _mm256_load_ps(&f[0]);                   printf("vec1 : ");printVecps(vec1); // load vector values from f[0]-f[7]
    vec2 = _mm256_cmp_ps ( mask, vec1, _CMP_LT_OS /*0x1*/);
                                                    printf("vec2 : ");printVecps(vec2); // compare them to mask (less)
    vec3 = _mm256_min_ps (vec2 , mask);             printf("vec3 : ");printVecps(vec3); // select minimum from mask and compared results

    return 0;
}

mask = {1,1,1,1,1,1,1,1,1} 的输出是:

vec1 :  [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00 
vec2 :  [0]=-nan, [1]=0.00, [2]=-nan, [3]=-nan, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00 
vec3 :  [0]=1.00, [1]=0.00, [2]=1.00, [3]=1.00, [4]=0.00, [5]=1.00, [6]=1.00, [7]=0.00 

对于 mask = {2,2,2,2,2,2,2,2,2} 是:

vec1 :  [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00 
vec2 :  [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00 
vec3 :  [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=2.00, [6]=2.00, [7]=0.00 

这取决于 _mm256_min_ps 与NaN的非交换行为,以用1.0替换NaN元素. NaN>1.0:NaN:1.0 = 1.0 ,因为 NaN>一切总是错误的.

This depends on the non-commutative behaviour of _mm256_min_ps with NaNs to replace the NaN elements with 1.0. NaN > 1.0 : NaN : 1.0 = 1.0, because NaN > anything is always false.

请注意

Beware that gcc before 7.0 treats the 128b _mm_min_ps intrinsic as commutative even without -ffast-math (even though it knows the minps instruction isn't). Use an up-to-date gcc, or make sure that gcc chooses to compile your code with the operands in the order needed by this algorithm. (Or use clang). It's possible that gcc won't ever swap the operands with AVX, only with SSE (to avoid extra movapd instructions), but the safest thing is to use gcc7 or later.

这篇关于AVX2浮点比较并获得0.0或1.0而不是全0或全1位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆