SSE2带符号整数溢出是否未定义? [英] Is SSE2 signed integer overflow undefined?

查看:145
本文介绍了SSE2带符号整数溢出是否未定义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C和C ++中,未定义有符号整数溢出.但是在__m128i的各个字段中有符号整数溢出怎么办?换句话说,这种行为是在英特尔标准中定义的吗?

Signed integer overflow is undefined in C and C++. But what about signed integer overflow within the individual fields of an __m128i? In other words, is this behavior defined in the Intel standards?

#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
#include <emmintrin.h>

union SSE2
{
    __m128i m_vector;
    uint32_t m_dwords[sizeof(__m128i) / sizeof(uint32_t)];
};

int main()
{
    union SSE2 reg = {_mm_set_epi32(INT32_MAX, INT32_MAX, INT32_MAX, INT32_MAX)};
    reg.m_vector = _mm_add_epi32(reg.m_vector, _mm_set_epi32(1, 1, 1, 1));

    printf("%08" PRIX32 "\n", (uint32_t) reg.m_dwords[0]);
    return 0;
}

[myria@polaris tests]$ gcc -m64 -msse2 -std=c11 -O3 sse2defined.c -o sse2defined
[myria@polaris tests]$ ./sse2defined
80000000

请注意,SSE2 __m128i的4字节大小的字段被认为是带符号的.

Note that the 4-byte-sized fields of an SSE2 __m128i are considered signed.

推荐答案

此问题大约有三件事(不是以否决票的方式,而是以您缺乏理解"的方式.. .这就是为什么我想你来这里的原因.

There are about three things wrong with this question (not in a down vote sort of way, in a "you are lacking an understanding" kind of way ... which is why I guess you have come here).

1)您是在询问特定的实施问题(使用SSE2),而不是标准.您已经回答了自己的问题"C中未定义有符号整数溢出".

1) You are asking about a specific implementation issue (using SSE2) and not about the standard. You've answered your own question "signed integer overflow is undefined in C".

2)当您处理c内部函数时,您甚至都没有使用C进行编程!这些是按行插入汇编指令.它以某种可移植的方式进行,但是不再是您的数据是有符号整数.它是传递给SSE内部函数的向量类型.然后将其强制转换为整数,并告诉C您想查看该操作的结果. 在您转换时碰巧遇到的所有字节都是您所看到的,并且与C标准中的带符号算术无关.

2) When you are dealing with c intrinsics you aren't even programming in C! These are inserting assembly instructions in line. It is doing it in a some what portable way, but it is no longer true that your data is a signed integer. It is a vector type being passed to an SSE intrinsic. YOU are then casting that to an integer and telling C that you want to see the result of that operation. Whatever bytes happen to be there when you cast is what you will see and has nothing to do with signed arithmetic in the C standard.

3)只有两个错误的假设.我对错误的数量做出了假设,但那是错误的.

3) There was only two wrong assumptions. I made an assumption about the number of errors and was wrong.

如果编译器插入SSE指令(例如循环),则情况有所不同.现在,编译器保证结果与有符号的32位操作相同...除非存在未定义的行为(例如,溢出),在这种情况下,编译器可以执行其喜欢的任何操作.

Things are a bit different if the compiler inserts SSE instructions (say in a loop). Now the compiler is guaranteeing that the result is the same as a signed 32 bit operation ... UNLESS there is undefined behaviour (e.g. an overflow) in which case it can do whatever it likes.

还请注意,未定义并不意味着意料之外……您观察到的自动矢量化行为可能是一致且可重复的(也许它总是包裹在您的机器上……在周围的所有情况下都可能不正确)代码,或者所有编译器;或者,如果编译器根据SSSE3,SSE4或AVX *的可用性选择不同的指令,则如果它为使用或不使用的不同指令集做出不同的代码生成选择,则可能甚至不是所有处理器签名溢出的优势是UB).

Note also that undefined doesn't mean unexpected ... whatever behaviour your observe for auto-vectorization might be consistent and repeatable (maybe it does always wrap on your machine ... that might not be true with all cases for surrounding code, or all compilers. Or if the compiler selects different instructions depending on availability of SSSE3, SSE4, or AVX*, possibly not even all processors if it makes different code-gen choices for different instruction-sets that do or don't take advantage of signed overflow being UB).

好吧,现在我们要问的是英特尔标准"(不存在,我想您的意思是x86标准),我可以在答案中添加一些内容.事情有点令人费解.

Okay, well now that we are asking about "the Intel standards" (which don't exist, I think you mean the x86 standards), I can add something to my answer. Things are a little bit convoluted.

首先,固有的_mm_add_epi32由 Microsoft 以匹配Intel的内在API定义( https://软件. intel.com/sites/landingpage/IntrinsicsGuide/以及英特尔x86组装手册中的内在说明).他们巧妙地将其定义为对__m128i所做的操作,与x86 PADDD指令对XMM寄存器所做的操作相同,不再赘述(例如,这是ARM上的编译错误还是应该对其进行仿真?).

Firstly, the intrinsic _mm_add_epi32 is defined by Microsoft to match Intel's intrinsics API definition (https://software.intel.com/sites/landingpage/IntrinsicsGuide/ and the intrinsic notes in Intel's x86 assembly manuals). They cleverly define it as doing to a __m128i the same thing the x86 PADDD instruction does to an XMM register, with no more discussion (e.g. is it a compile error on ARM or should it be emulated?).

第二,PADDD不仅是已签名的附加项!这是一个32位二进制加法. x86对有符号整数使用二进制补码,并且将它们加起来与无符号基数2是相同的二进制运算. 此处.

Secondly, PADDD isn't only a signed addition! It is a 32 bit binary add. x86 uses two's complement for signed integers, and adding them is the same binary operation as unsigned base 2. So yes, paddd is guaranteed to wrap. There is a good reference for all the x86 instructions here.

那是什么意思:再次,您的问题中的假设是有缺陷的,因为甚至没有任何溢出.因此,您看到的输出应定义为行为.请注意,它是由Microsoft和x86定义的(不是C标准).

So what does that mean: again, the assumption in your question is flawed because there isn't even any overflow. So the output you see should be defined behaviour. Note that it is defined by Microsoft and x86 (not by the C Standard).

其他x86编译器也以相同的方式实现Intel的内在API,因此_mm_add_epi32可以保证只包装.

Other x86 compilers also implement Intel's intrinsics API the same way, so _mm_add_epi32 is portably guaranteed to just wrap.

这篇关于SSE2带符号整数溢出是否未定义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆