AVX2浮点比较并获得0.0或1.0而不是全0或全1位 [英] AVX2 float compare and get 0.0 or 1.0 instead of all-0 or all-one bits
问题描述
基本上,在结果矢量中,我想为所有输入浮点值> 1保留1.0,而为所有输入浮点值< = 1保留0.0.这是我的代码,
Basically, in the resulting vector, I want to save 1.0 for all input floating point values > 1, while 0.0 for all input floating point values <= 1. Here is my code,
float f[8] = {1.2, 0.5, 1.7, 1.9, 0.34, 22.9, 18.6, 0.7};
float r[8]; // Must be {1, 0, 1, 1, 0, 1, 1, 0}
__m256i tmp1 = _mm256_cvttps_epi32(_mm256_loadu_ps(f));
__m256i tmp2 = _mm256_cmpgt_epi32(tmp1, _mm256_set1_epi32(1));
_mm256_store_ps(r, _mm256_cvtepi32_ps(tmp2));
for(int i = 0; i < 8; i++)
std::cout << f[i] << " : " << r[i] << std::endl;
但是我没有得到正确的结果.这就是我得到的.为什么AVX2关系操作对我来说不能正常工作?
But I don't get the correct results. This is what I get. Why aren't AVX2 relational operations working properly for me?
1.2 : 0
0.5 : 0
1.7 : 0
1.9 : 0
0.34 : 0
22.9 : -1
18.6 : -1
0.7 : 0
推荐答案
我认为最好将 _mm256_cmp_ps
用于您的问题.为此,我已经实现了以下程序.这不仅仅是您想要的.如果要保存,则应将所有 mask
元素设置为 1
,但是如果要保存其他数字,则可以将掩码值更改为所需的值.
I think it's better to use _mm256_cmp_ps
for your question. I have implemented the following program for this purpose. This is more than what you want. If you want to save ones you should set all mask
elements to 1
, but if you want to save another number you can change the mask value to whatever you want.
//gcc 6.2, Linux-mint, Skylake
#include <stdio.h>
#include <x86intrin.h>
float __attribute__(( aligned(32))) f[8] = {1.2, 0.5, 1.7, 1.9, 0.34, 22.9, 18.6, 1.0};
// float __attribute__(( aligned(32))) r[8]; // Must be {1, 0, 1, 1, 0, 1, 1, 0}
// in C++11, use alignas(32). Or C11 _Alignas(32), instead of GNU C __attribute__.
void printVecps(__m256 vec)
{
float tempps[8];
_mm256_store_ps(&tempps[0], vec);
printf(" [0]=%3.2f, [1]=%3.2f, [2]=%3.2f, [3]=%3.2f, [4]=%3.2f, [5]=%3.2f, [6]=%3.2f, [7]=%3.2f \n",
tempps[0],tempps[1],tempps[2],tempps[3],tempps[4],tempps[5],tempps[6],tempps[7]) ;
}
int main()
{
__m256 mask = _mm256_set1_ps(1.0), vec1, vec2, vec3;
vec1 = _mm256_load_ps(&f[0]); printf("vec1 : ");printVecps(vec1); // load vector values from f[0]-f[7]
vec2 = _mm256_cmp_ps ( mask, vec1, _CMP_LT_OS /*0x1*/);
printf("vec2 : ");printVecps(vec2); // compare them to mask (less)
vec3 = _mm256_min_ps (vec2 , mask); printf("vec3 : ");printVecps(vec3); // select minimum from mask and compared results
return 0;
}
mask = {1,1,1,1,1,1,1,1,1}
的输出是:
vec1 : [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00
vec2 : [0]=-nan, [1]=0.00, [2]=-nan, [3]=-nan, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00
vec3 : [0]=1.00, [1]=0.00, [2]=1.00, [3]=1.00, [4]=0.00, [5]=1.00, [6]=1.00, [7]=0.00
对于 mask = {2,2,2,2,2,2,2,2,2}
是:
vec1 : [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00
vec2 : [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00
vec3 : [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=2.00, [6]=2.00, [7]=0.00
这取决于 _mm256_min_ps
与NaN的非交换行为,以用1.0替换NaN元素. NaN>1.0:NaN:1.0
= 1.0
,因为 NaN>一切
总是错误的.
This depends on the non-commutative behaviour of _mm256_min_ps
with NaNs to replace the NaN elements with 1.0. NaN > 1.0 : NaN : 1.0
= 1.0
, because NaN > anything
is always false.
Beware that gcc before 7.0 treats the 128b _mm_min_ps
intrinsic as commutative even without -ffast-math
(even though it knows the minps
instruction isn't). Use an up-to-date gcc, or make sure that gcc chooses to compile your code with the operands in the order needed by this algorithm. (Or use clang). It's possible that gcc won't ever swap the operands with AVX, only with SSE (to avoid extra movapd
instructions), but the safest thing is to use gcc7 or later.
这篇关于AVX2浮点比较并获得0.0或1.0而不是全0或全1位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!