SSE/AVX:基于每个元素的最小和最大绝对值，从两个m256浮点向量中进行选择 [英] SSE/AVX: Choose from two m256 float vectors based on per-element min and max absolute value

查看：75 发布时间：2021/4/12 20:55:01 sse intrinsics avx avx512

本文介绍了SSE/AVX:基于每个元素的最小和最大绝对值，从两个__m256浮点向量中进行选择的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找

// Given
float u[8];
float v[8];

// Compute
float a[8];
float b[8];

//  Such that
for ( int i = 0; i < 8; ++i )
{
    a[i] = fabs(u[i]) >= fabs(v[i]) ? u[i] : v[i];
    b[i] = fabs(u[i]) <  fabs(v[i]) ? u[i] : v[i];
}

即，我需要根据 mask 从 u 和 v 中的 a 中逐个元素地选择，并根据！mask 放入 b ，其中 mask =(fabs(u)> = fabs(v))逐元素./p>

I.e., I need to select element-wise into a from u and v based on mask, and into b based on !mask, where mask = (fabs(u) >= fabs(v)) element-wise.

推荐答案

前几天，我也遇到了同样的问题.我想出的解决方案(仅使用AVX)是:

I had this exact same problem just the other day. The solution I came up with (using AVX only) was:

// take the absolute value of u and v
__m256 sign_bit = _mm256_set1_ps(-0.0f);
__m256 u_abs = _mm256_andnot_ps(sign_bit, u);
__m256 v_abs = _mm256_andnot_ps(sign_bit, v);
// get a mask indicating the indices for which abs(u[i]) >= abs(v[i])
__m256 u_ge_v = _mm256_cmp_ps(u_abs, v_abs, _CMP_GE_OS);
// use the mask to select the appropriate elements into a and b, flipping the argument
// order for b to invert the sense of the mask
__m256 a = _mm256_blendv_ps(u, v, u_ge_v);
__m256 b = _mm256_blendv_ps(v, u, u_ge_v);

相当于AVX512的是:

The AVX512 equivalent would be:

// take the absolute value of u and v
__m512 sign_bit = _mm512_set1_ps(-0.0f);
__m512 u_abs = _mm512_andnot_ps(sign_bit, u);
__m512 v_abs = _mm512_andnot_ps(sign_bit, v);
// get a mask indicating the indices for which abs(u[i]) >= abs(v[i])
__mmask16 u_ge_v = _mm512_cmp_ps_mask(u_abs, v_abs, _CMP_GE_OS);
// use the mask to select the appropriate elements into a and b, flipping the argument
// order for b to invert the sense of the mask
__m512 a = _mm512_mask_blend_ps(u_ge_v, u, v);
__m512 b = _mm512_mask_blend_ps(u_ge_v, v, u);

正如彼得·科德斯(Peter Cordes)在上述评论中所建议的那样，还有其他一些方法，例如，取绝对值后接一个最小值/最大值，然后重新插入符号位，但我找不到比该值更短/更低的延迟的东西.此指令序列.

As Peter Cordes suggested in the comments above, there are other approaches as well like taking the absolute value followed by a min/max and then reinserting the sign bit, but I couldn't find anything that was shorter/lower latency than this sequence of instructions.

这篇关于SSE/AVX:基于每个元素的最小和最大绝对值，从两个__m256浮点向量中进行选择的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SSE/AVX:基于每个元素的最小和最大绝对值，从两个m256浮点向量中进行选择 [英] SSE/AVX: Choose from two m256 float vectors based on per-element min and max absolute value

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SSE/AVX:基于每个元素的最小和最大绝对值，从两个__m256浮点向量中进行选择 [英] SSE/AVX: Choose from two __m256 float vectors based on per-element min and max absolute value

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

SSE/AVX:基于每个元素的最小和最大绝对值，从两个m256浮点向量中进行选择 [英] SSE/AVX: Choose from two m256 float vectors based on per-element min and max absolute value

登录关闭