为什么gcc允许从结构中投机加载? [英] Why is gcc allowed to speculatively load from a struct?

查看:113
本文介绍了为什么gcc允许从结构中投机加载?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例显示可能会出错的gcc优化和用户代码

下面代码片段中的函数'foo'只会加载其中一个结构成员A或B;至少这是未优化代码的意图。

  typedef struct {
int A;
int B;
}对;

int foo(const Pair * P,int c){
int x;
if(c)
x = P-> A;
else
x = P-> B;
返回c / 102 + x;
}

以下是gcc -O3给出的内容:

  mov eax,esi 
mov edx,-1600085855
test esi,esi
mov ecx,DWORD PTR [rdi + 4 ]< - ***加载P-> B **
cmovne ecx,DWORD PTR [rdi]< - ***加载P-> A ***
imul edx
lea eax,[rdx + rsi]
sar esi,31
sar eax,6
sub eax,esi
add eax,ecx
ret

所以看起来gcc被允许推测性地加载两个结构成员以消除分支。但是,那么下面的代码是否考虑了未定义的行为,或者是上面的gcc优化是非法的?

  #include< stdlib.h> 

int naughty_caller(int c){
Pair * P =(Pair *)malloc(sizeof(Pair)-1); // ***如果(!P)返回-1,则分配对于A而言是足够的而不是B ***
;

P-> A = 0x42; // ***只在保证分配的地方初始化分配***

int res = foo(P,1); // ***将c = 1传递给foo应该确保只访问P-> A? ***

免费(P);
return res;





$ b如果在上述情况下会发生负载猜测, P> B会引起异常,因为P-> B的最后一个字节可能位于未分配的内存中。如果关闭优化,则不会发生此异常。



问题



是上面显示的gcc优化负载投机合法吗?规范在哪里说或暗示它没关系?
如果优化是合法的,'naughtly_caller'中的代码如何变成未定义的行为?

解决方案

读取变量(未声明为 volatile )不被视为C标准所指定的副作用。因此,该程序可以自由地读取一个位置,然后放弃结果,就C标准而言。



这很常见。假设您请求4字节整数的1个字节的数据。如果速度更快(对齐读取),编译器可能会读取整个32位,然后丢弃除请求的字节以外的所有内容。你的例子与此类似,但编译器决定读取整个结构。



形式上,这可以在抽象机器的行为中找到,C11章节5.1.2.3 。考虑到编译器遵循那里指定的规则,它可以随心所欲地执行。列出的唯一规则是关于 volatile 对象和指令排序。在 volatile 结构体中读取不同的结构体成员不会好。



至于分配太少整个结构的内存,这是未定义的行为。因为结构的内存布局通常不是程序员决定的 - 例如编译器可以在最后添加填充。如果没有足够的内存分配,即使您的代码仅适用于结构的第一个成员,您也可能会访问禁止的内存。


Example Showing the gcc Optimization and User Code that May Fault

The function 'foo' in the snippet below will load only one of the struct members A or B; well at least that is the intention of the unoptimized code.

typedef struct {
  int A;
  int B;
} Pair;

int foo(const Pair *P, int c) {
  int x;
  if (c)
    x = P->A;
  else
    x = P->B;
  return c/102 + x;
}

Here is what gcc -O3 gives:

mov eax, esi
mov edx, -1600085855
test esi, esi
mov ecx, DWORD PTR [rdi+4]   <-- ***load P->B**
cmovne ecx, DWORD PTR [rdi]  <-- ***load P->A***
imul edx
lea eax, [rdx+rsi]
sar esi, 31
sar eax, 6
sub eax, esi
add eax, ecx
ret

So it appears that gcc is allowed to speculatively load both struct members in order to eliminate branching. But then, is the following code considered undefined behavior or is the gcc optimization above illegal?

#include <stdlib.h>  

int naughty_caller(int c) {
  Pair *P = (Pair*)malloc(sizeof(Pair)-1); // *** Allocation is enough for A but not for B ***
  if (!P) return -1;

  P->A = 0x42; // *** Initializing allocation only where it is guaranteed to be allocated ***

  int res = foo(P, 1); // *** Passing c=1 to foo should ensure only P->A is accessed? ***

  free(P);
  return res;
}

If the load speculation will happen in the above scenario there is a chance that loading P->B will cause an exception because the last byte of P->B may lie in unallocated memory. This exception will not happen if the optimization is turned off.

The Question

Is the gcc optimization shown above of load speculation legal? Where does the spec say or imply that it's ok? If the optimization is legal, how is the code in 'naughtly_caller' turn out to be undefined behavior?

解决方案

Reading a variable (that was not declared as volatile) is not considered to be a "side effect" as specified by the C standard. So the program is free to read a location and then discard the result, as far as the C standard is concerned.

This is very common. Suppose you request 1 byte of data from a 4 byte integer. The compiler may then read the whole 32 bits if that's faster (aligned read), and then discard everything but the requested byte. Your example is similar to this but the compiler decided to read the whole struct.

Formally this is found in the behavior of "the abstract machine", C11 chapter 5.1.2.3. Given that the compiler follows the rules specified there, it is free to do as it pleases. And the only rules listed are regarding volatile objects and sequencing of instructions. Reading a different struct member in a volatile struct would not be ok.

As for the case of allocating too little memory for the whole struct, that's undefined behavior. Because the memory layout of the struct is usually not for the programmer to decide - for example the compiler is allowed to add padding at the end. If there's not enough memory allocated, you might end up accessing forbidden memory even though your code only works with the first member of the struct.

这篇关于为什么gcc允许从结构中投机加载?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆