如何将__m256水平求和? [英] How to sum __m256 horizontally?
问题描述
我想使用AVX指令水平求和__m256
向量的分量.
在SSE中,我可以使用
I would like to horizontally sum the components of a __m256
vector using AVX instructions.
In SSE I could use
_mm_hadd_ps(xmm,xmm);
_mm_hadd_ps(xmm,xmm);
在向量的第一个分量处获取结果,但这不能随函数(_mm256_hadd_ps
)的256位版本缩放.
to get the result at the first component of the vector, but this does not scale with the 256 bit version of the function (_mm256_hadd_ps
).
计算__m256
向量的水平和的最佳方法是什么?
What is the best way to compute the horizontal sum of a __m256
vector?
推荐答案
此版本对于Intel Sandy/Ivy Bridge和AMD Bulldozer以及更高版本的CPU应该是最佳的.
This version should be optimal for both Intel Sandy/Ivy Bridge and AMD Bulldozer, and later CPUs.
// x = ( x7, x6, x5, x4, x3, x2, x1, x0 )
float sum8(__m256 x) {
// hiQuad = ( x7, x6, x5, x4 )
const __m128 hiQuad = _mm256_extractf128_ps(x, 1);
// loQuad = ( x3, x2, x1, x0 )
const __m128 loQuad = _mm256_castps256_ps128(x);
// sumQuad = ( x3 + x7, x2 + x6, x1 + x5, x0 + x4 )
const __m128 sumQuad = _mm_add_ps(loQuad, hiQuad);
// loDual = ( -, -, x1 + x5, x0 + x4 )
const __m128 loDual = sumQuad;
// hiDual = ( -, -, x3 + x7, x2 + x6 )
const __m128 hiDual = _mm_movehl_ps(sumQuad, sumQuad);
// sumDual = ( -, -, x1 + x3 + x5 + x7, x0 + x2 + x4 + x6 )
const __m128 sumDual = _mm_add_ps(loDual, hiDual);
// lo = ( -, -, -, x0 + x2 + x4 + x6 )
const __m128 lo = sumDual;
// hi = ( -, -, -, x1 + x3 + x5 + x7 )
const __m128 hi = _mm_shuffle_ps(sumDual, sumDual, 0x1);
// sum = ( -, -, -, x0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 )
const __m128 sum = _mm_add_ss(lo, hi);
return _mm_cvtss_f32(sum);
}
haddps
在任何CPU上都不有效;最好的办法是洗牌(抽出一半)并加一,重复直到剩下一个元素.第一步,缩小到128位将使AMD在Zen2之前受益,这在任何地方都不是一件坏事.
haddps
is not efficient on any CPU; the best you can do is one shuffle (to extract the high half) and one add, repeat until one element left. Narrowing to 128-bit as the first step benefits AMD before Zen2, and is not a bad thing anywhere.
请参见执行水平SSE矢量和的最快方法在x86上获取有关效率的更多详细信息.
See Fastest way to do horizontal SSE vector sum on x86 for more details about efficiency.
这篇关于如何将__m256水平求和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!