是否可以使用 SSE 和 SSE2 生成 128 位宽的整数? [英] Is it possible to use SSE and SSE2 to make a 128-bit wide integer?

查看:28
本文介绍了是否可以使用 SSE 和 SSE2 生成 128 位宽的整数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想进一步了解 SSE2 的功能,想知道是否可以制作一个 128 位宽的整数来支持加法、减法、XOR 和乘法?

解决方案

SIMD 旨在同时处理多个小值,因此不会有任何结转到更高单元,您必须手动执行此操作.在 SSE2 中没有进位标志,但您可以轻松地将进位计算为 carry = sum <;acarry = sum 喜欢这个.更糟糕的是,SSE2 也没有 64 位比较,因此您必须使用一些解决方法,例如 此处

这是一个基于上述思想的未经测试、未经优化的 C 代码:

inline bool lessthan(__m128i a, __m128i b){a = _mm_xor_si128(a, _mm_set1_epi32(0x80000000));b = _mm_xor_si128(b, _mm_set1_epi32(0x80000000));__m128i t = _mm_cmplt_epi32(a, b);__m128i u = _mm_cmpgt_epi32(a, b);__m128i z = _mm_or_si128(t, _mm_shuffle_epi32(t, 177));z = _mm_andnot_si128(_mm_shuffle_epi32(u, 245),z);返回 _mm_cvtsi128_si32(z) &1;}内联 __m128i addi128(__m128i a, __m128i b){__m128i 总和 = _mm_add_epi64(a, b);__m128i 掩码 = _mm_set1_epi64(0x8000000000000000);如果(小于(_mm_xor_si128(掩码,总和),_mm_xor_si128(掩码,a))){__m128i ONE = _mm_setr_epi64(0, 1);sum = _mm_add_epi64(sum, ONE);}返还金额;}

如您所见,代码需要更多指令,即使优化后它可能仍然比 x86_64 中的简单 2 ADD/ADC 对(或 x86 中的 4 条指令)长得多


SSE2 会有所帮助,如果您有多个 128 位整数要并行添加.但是你需要把数值的高低部分安排好,这样我们就可以一次添加所有的低部分,一次添加所有的高部分

另见

I'm looking to understand SSE2's capabilities a little more, and would like to know if one could make a 128-bit wide integer that supports addition, subtraction, XOR and multiplication?

解决方案

SIMD is meant to work on multiple small values at the same time, hence there won't be any carry over to the higher unit and you must do that manually. In SSE2 there's no carry flag but you can easily calculate the carry as carry = sum < a or carry = sum < b like this. Worse yet, SSE2 doesn't have 64-bit comparisons either, so you must use some workaround like the one here

Here is an untested, unoptimized C code based on the idea above:

inline bool lessthan(__m128i a, __m128i b){
    a = _mm_xor_si128(a, _mm_set1_epi32(0x80000000));
    b = _mm_xor_si128(b, _mm_set1_epi32(0x80000000));
    __m128i t = _mm_cmplt_epi32(a, b);
    __m128i u = _mm_cmpgt_epi32(a, b);
    __m128i z = _mm_or_si128(t, _mm_shuffle_epi32(t, 177));
    z = _mm_andnot_si128(_mm_shuffle_epi32(u, 245),z);
    return _mm_cvtsi128_si32(z) & 1;
}

inline __m128i addi128(__m128i a, __m128i b)
{
    __m128i sum = _mm_add_epi64(a, b);
    __m128i mask = _mm_set1_epi64(0x8000000000000000);    
    if (lessthan(_mm_xor_si128(mask, sum), _mm_xor_si128(mask, a)))
    {
        __m128i ONE = _mm_setr_epi64(0, 1);
        sum = _mm_add_epi64(sum, ONE);
    }

    return sum;
}

As you can see, the code requires many more instructions and even after optimizing it may still be much longer than a simple 2 ADD/ADC pair in x86_64 (or 4 instructions in x86)


SSE2 will help though, if you have multiple 128-bit integers to add in parallel. However you need to arrange the high and low parts of the values properly so that we can add all the low parts at once, and all the high parts at once

See also

这篇关于是否可以使用 SSE 和 SSE2 生成 128 位宽的整数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆