如何在256位AVX向量中找到水平最大值 [英] How to find the horizontal maximum in a 256-bit AVX vector
问题描述
我有一个__m256d向量,其中包含四个64位浮点值.
我需要找到向量元素的水平最大值,并将结果存储在双精度标量值中;
I have a __m256d vector packed with four 64-bit floating-point values.
I need to find the horizontal maximum of the vector's elements and store the result in a double-precision scalar value;
我所有的尝试最终都使用了向量元素的大量改组,这使得代码不是很优雅也不高效.另外,我发现不可能仅停留在AVX域中.在某些时候,我不得不使用SSE 128位指令来提取最终的64位值.但是,我想在最后一条声明中被证明是错误的.
My attempts all ended up using a lot of shuffling of the vector elements, making the code not very elegant nor efficient. Also, I found it impossible to stay only in the AVX domain. At some point I had to use SSE 128-bit instructions to extract the final 64-bit value. However, I would like to be proved wrong on this last statement.
因此理想的解决方案将是:
1)仅使用AVX指令.
2)减少指令数量. (我希望不超过3-4条指令)
So the ideal solution will:
1) only use only AVX instructions.
2) minimize the number of instructions. (I am hoping for no more than 3-4 instructions)
话虽如此,任何不雅/高效的解决方案都将被接受,即使它不遵守上述准则.
Having said that, any elegant/efficient solution will be accepted, even if it doesn't adhere to the above guidelines.
感谢您的帮助.
-路易吉
推荐答案
我认为您没有比4条指令做得更好的了:2次随机播放和2次比较.
I don't think you can do much better than 4 instructions: 2 shuffles and 2 comparisons.
__m256d x = ...; // input
__m128d y = _mm256_extractf128_pd(x, 1); // extract x[2], and x[3]
__m128d m1 = _mm_max_pd(x, y); // m1[0] = max(x[0], x[2]), m1[1] = max(x[1], x[3])
__m128d m2 = _mm_permute_pd(m1, 1); // set m2[0] = m1[1], m2[1] = m1[0]
__m128d m = _mm_max_pd(m1, m2); // both m[0] and m[1] contain the horizontal max(x[0], x[1], x[2], x[3])
仅适用于256位向量的临时修改:
Trivial modification to only work with 256-bit vectors:
__m256d x = ...; // input
__m256d y = _mm256_permute2f128_pd(x, x, 1); // permute 128-bit values
__m256d m1 = _mm256_max_pd(x, y); // m1[0] = max(x[0], x[2]), m1[1] = max(x[1], x[3]), etc.
__m256d m2 = _mm256_permute_pd(m1, 5); // set m2[0] = m1[1], m2[1] = m1[0], etc.
__m256d m = _mm256_max_pd(m1, m2); // all m[0] ... m[3] contain the horizontal max(x[0], x[1], x[2], x[3])
(未经测试)
这篇关于如何在256位AVX向量中找到水平最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!