SSE2整数溢出检查 [英] SSE2 integer overflow checking

查看:106
本文介绍了SSE2整数溢出检查的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用诸如PADDD(即_mm_add_epi32固有的)之类的SSE2指令时,是否可以检查任何操作是否溢出?

When using SSE2 instructions such as PADDD (i.e., the _mm_add_epi32 intrinsic), is there a way to check whether any of the operations overflowed?

我认为MXCSR控制寄存器上的标志可能在溢出后被置位,但我看不到这种情况.例如,在下面的两种情况下,_mm_getcsr()都将打印相同的值:

I thought that maybe a flag on the MXCSR control register may get set after an overflow, but I don't see that happening. For example, _mm_getcsr() prints the same value in both cases below (8064):

#include <iostream>
#include <emmintrin.h>

using namespace std;

void main()
{
    __m128i a = _mm_set_epi32(1, 0, 0, 0);
    __m128i b = _mm_add_epi32(a, a);
    cout << "MXCSR:  " << _mm_getcsr() << endl;
    cout << "Result: " << b.m128i_i32[3] << endl;

    __m128i c = _mm_set_epi32((1<<31)-1, 3, 2, 1);
    __m128i d = _mm_add_epi32(c, c);
    cout << "MXCSR:  " << _mm_getcsr() << endl;
    cout << "Result: " << d.m128i_i32[3] << endl;
}

还有其他方法可以检查SSE2是否溢出吗?

Is there some other way to check for overflow with SSE2?

推荐答案

以下是 @hirschhornsalz的 sum_and_overflow函数:

Here is a somewhat more efficient version of @hirschhornsalz's sum_and_overflow function:

void sum_and_overflow(__v4si a, __v4si b, __v4si& sum, __v4si& overflow)
{
   __v4si sa, sb;

    sum = _mm_add_epi32(a, b);                  // calculate sum
    sa = _mm_xor_si128(sum, a);                 // compare sign of sum with sign of a
    sb = _mm_xor_si128(sum, b);                 // compare sign of sum with sign of b
    overflow = _mm_and_si128(sa, sb);           // get overflow in sign bit
    overflow = _mm_srai_epi32(overflow, 31);    // convert to SIMD boolean (-1 == TRUE, 0 == FALSE)
}

它使用表达式从 Hacker's Delight 第27页:

It uses an expression for overflow detection from Hacker's Delight page 27:

sum = a + b;
overflow = (sum ^ a) & (sum ^ b);               // overflow flag in sign bit

请注意,溢出向量将包含更常规的SIMD布尔值,对于TRUE(溢出)为-1,对于FALSE(无溢出)为0.如果只需要符号位中的溢出,而其他位是无关位",则可以省略函数的最后一行,从而将SIMD指令的数量从5个减少到4个.

Note that the overflow vector will contain the more conventional SIMD boolean values of -1 for TRUE (overflow) and 0 for FALSE (no overflow). If you only need the overflow in the sign bit and the other bits are "don't care" then you can omit the last line of the function, reducing the number of SIMD instructions from 5 to 4.

NB:此解决方案以及该解决方案所基于的先前解决方案都是针对有符号整数值的.解决无符号值的方法将需要稍微不同的方法(请参阅 @Stephen Canon

NB: this solution, as well as the previous solution on which it is based are for signed integer values. A solution for unsigned values will require a slightly different approach (see @Stephen Canon's answer).

这篇关于SSE2整数溢出检查的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆