SSE/AVX:基于每个元素的最小和最大绝对值,从两个__m256浮点向量中进行选择 [英] SSE/AVX: Choose from two __m256 float vectors based on per-element min and max absolute value

查看:75
本文介绍了SSE/AVX:基于每个元素的最小和最大绝对值,从两个__m256浮点向量中进行选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找

// Given
float u[8];
float v[8];

// Compute
float a[8];
float b[8];

//  Such that
for ( int i = 0; i < 8; ++i )
{
    a[i] = fabs(u[i]) >= fabs(v[i]) ? u[i] : v[i];
    b[i] = fabs(u[i]) <  fabs(v[i]) ? u[i] : v[i];
}

即,我需要根据 mask u v 中的 a 中逐个元素地选择,并根据!mask 放入 b ,其中 mask =(fabs(u)> = fabs(v))逐元素./p>

I.e., I need to select element-wise into a from u and v based on mask, and into b based on !mask, where mask = (fabs(u) >= fabs(v)) element-wise.

推荐答案

前几天,我也遇到了同样的问题.我想出的解决方案(仅使用AVX)是:

I had this exact same problem just the other day. The solution I came up with (using AVX only) was:

// take the absolute value of u and v
__m256 sign_bit = _mm256_set1_ps(-0.0f);
__m256 u_abs = _mm256_andnot_ps(sign_bit, u);
__m256 v_abs = _mm256_andnot_ps(sign_bit, v);
// get a mask indicating the indices for which abs(u[i]) >= abs(v[i])
__m256 u_ge_v = _mm256_cmp_ps(u_abs, v_abs, _CMP_GE_OS);
// use the mask to select the appropriate elements into a and b, flipping the argument
// order for b to invert the sense of the mask
__m256 a = _mm256_blendv_ps(u, v, u_ge_v);
__m256 b = _mm256_blendv_ps(v, u, u_ge_v);

相当于AVX512的是:

The AVX512 equivalent would be:

// take the absolute value of u and v
__m512 sign_bit = _mm512_set1_ps(-0.0f);
__m512 u_abs = _mm512_andnot_ps(sign_bit, u);
__m512 v_abs = _mm512_andnot_ps(sign_bit, v);
// get a mask indicating the indices for which abs(u[i]) >= abs(v[i])
__mmask16 u_ge_v = _mm512_cmp_ps_mask(u_abs, v_abs, _CMP_GE_OS);
// use the mask to select the appropriate elements into a and b, flipping the argument
// order for b to invert the sense of the mask
__m512 a = _mm512_mask_blend_ps(u_ge_v, u, v);
__m512 b = _mm512_mask_blend_ps(u_ge_v, v, u);

正如彼得·科德斯(Peter Cordes)在上述评论中所建议的那样,还有其他一些方法,例如,取绝对值后接一个最小值/最大值,然后重新插入符号位,但我找不到比该值更短/更低的延迟的东西.此指令序列.

As Peter Cordes suggested in the comments above, there are other approaches as well like taking the absolute value followed by a min/max and then reinserting the sign bit, but I couldn't find anything that was shorter/lower latency than this sequence of instructions.

这篇关于SSE/AVX:基于每个元素的最小和最大绝对值,从两个__m256浮点向量中进行选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆