如何在256位AVX向量中找到水平最大值 [英] How to find the horizontal maximum in a 256-bit AVX vector

查看:284
本文介绍了如何在256位AVX向量中找到水平最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个__m256d向量,其中包含四个64位浮点值.
我需要找到向量元素的水平最大值,并将结果存储在双精度标量值中;

I have a __m256d vector packed with four 64-bit floating-point values.
I need to find the horizontal maximum of the vector's elements and store the result in a double-precision scalar value;

我所有的尝试最终都使用了向量元素的大量改组,这使得代码不是很优雅也不高效.另外,我发现不可能仅停留在AVX域中.在某些时候,我不得不使用SSE 128位指令来提取最终的64位值.但是,我想在最后一条声明中被证明是错误的.

My attempts all ended up using a lot of shuffling of the vector elements, making the code not very elegant nor efficient. Also, I found it impossible to stay only in the AVX domain. At some point I had to use SSE 128-bit instructions to extract the final 64-bit value. However, I would like to be proved wrong on this last statement.

因此理想的解决方案将是:
1)仅使用AVX指令.
2)减少指令数量. (我希望不超过3-4条指令)

So the ideal solution will:
1) only use only AVX instructions.
2) minimize the number of instructions. (I am hoping for no more than 3-4 instructions)

话虽如此,任何不雅/高效的解决方案都将被接受,即使它不遵守上述准则.

Having said that, any elegant/efficient solution will be accepted, even if it doesn't adhere to the above guidelines.

感谢您的帮助.

-路易吉

推荐答案

我认为您没有比4条指令做得更好的了:2次随机播放和2次比较.

I don't think you can do much better than 4 instructions: 2 shuffles and 2 comparisons.

__m256d x = ...; // input

__m128d y = _mm256_extractf128_pd(x, 1); // extract x[2], and x[3]
__m128d m1 = _mm_max_pd(x, y); // m1[0] = max(x[0], x[2]), m1[1] = max(x[1], x[3])
__m128d m2 = _mm_permute_pd(m1, 1); // set m2[0] = m1[1], m2[1] = m1[0]
__m128d m = _mm_max_pd(m1, m2); // both m[0] and m[1] contain the horizontal max(x[0], x[1], x[2], x[3])

仅适用于256位向量的临时修改:

Trivial modification to only work with 256-bit vectors:

__m256d x = ...; // input

__m256d y = _mm256_permute2f128_pd(x, x, 1); // permute 128-bit values
__m256d m1 = _mm256_max_pd(x, y); // m1[0] = max(x[0], x[2]), m1[1] = max(x[1], x[3]), etc.
__m256d m2 = _mm256_permute_pd(m1, 5); // set m2[0] = m1[1], m2[1] = m1[0], etc.
__m256d m = _mm256_max_pd(m1, m2); // all m[0] ... m[3] contain the horizontal max(x[0], x[1], x[2], x[3])

(未经测试)

这篇关于如何在256位AVX向量中找到水平最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆