4 个 32 位整数的 SSE 乘法 [英] SSE multiplication of 4 32-bit integers
本文介绍了4 个 32 位整数的 SSE 乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何将 4 个 32 位整数乘以另外 4 个整数?我没有找到任何可以做到这一点的说明.
How to multiply four 32-bit integers by another 4 integers? I didn't find any instruction which can do it.
推荐答案
如果您需要 signed 32x32 位整数乘法,则在 software.intel.com 看起来它应该做你想做的:
If you need signed 32x32 bit integer multiplication then the following example at software.intel.com looks like it should do what you want:
static inline __m128i muly(const __m128i &a, const __m128i &b)
{
__m128i tmp1 = _mm_mul_epu32(a,b); /* mul 2,0*/
__m128i tmp2 = _mm_mul_epu32( _mm_srli_si128(a,4), _mm_srli_si128(b,4)); /* mul 3,1 */
return _mm_unpacklo_epi32(_mm_shuffle_epi32(tmp1, _MM_SHUFFLE (0,0,2,0)), _mm_shuffle_epi32(tmp2, _MM_SHUFFLE (0,0,2,0))); /* shuffle results to [63..0] and pack */
}
您可能想要构建两个版本 - 一个用于旧 CPU,另一个用于新 CPU,在这种情况下,您可以执行以下操作:
You might want to have two builds - one for old CPUs and one for recent CPUs, in which case you could do the following:
static inline __m128i muly(const __m128i &a, const __m128i &b)
{
#ifdef __SSE4_1__ // modern CPU - use SSE 4.1
return _mm_mullo_epi32(a, b);
#else // old CPU - use SSE 2
__m128i tmp1 = _mm_mul_epu32(a,b); /* mul 2,0*/
__m128i tmp2 = _mm_mul_epu32( _mm_srli_si128(a,4), _mm_srli_si128(b,4)); /* mul 3,1 */
return _mm_unpacklo_epi32(_mm_shuffle_epi32(tmp1, _MM_SHUFFLE (0,0,2,0)), _mm_shuffle_epi32(tmp2, _MM_SHUFFLE (0,0,2,0))); /* shuffle results to [63..0] and pack */
#endif
}
这篇关于4 个 32 位整数的 SSE 乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文