可以将具有未定义行为的分支假定为不可达,并作为死代码进行优化? [英] Can branches with undefined behavior be assumed unreachable and optimized as dead code?

查看:162
本文介绍了可以将具有未定义行为的分支假定为不可达,并作为死代码进行优化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下语句:

*((char*)NULL) = 0; //undefined behavior

它清楚地调用未定义的行为。在给定程序中这样的语句的存在意味着整个程序是未定义的,或者一旦控制流命中这个语句,行为只会变得未定义?

It clearly invokes undefined behavior. Does the existence of such a statement in a given program mean that the whole program is undefined or that behavior only becomes undefined once control flow hits this statement?

以下程序如果用户从未输入数字 3

Would the following program be well-defined in case the user never enters the number 3?

while (true) {
 int num = ReadNumberFromConsole();
 if (num == 3)
  *((char*)NULL) = 0; //undefined behavior
}

或者是无论用户输入什么都是完全未定义的行为?

Or is it entirely undefined behavior no matter what the user enters?

此外,编译器可以假设未定义的行为在运行时永远不会被执行吗?这将允许及时推理:

Also, can the compiler assume that undefined behavior will never be executed at runtime? That would allow for reasoning backwards in time:

int num = ReadNumberFromConsole();

if (num == 3) {
 PrintToConsole(num);
 *((char*)NULL) = 0; //undefined behavior
}

这里, $ c> num == 3 ,我们将总是调用未定义的行为。因此,这种情况必须是不可能的,并且数字不需要打印。可以优化整个 if 语句。

Here, the compiler could reason that in case num == 3 we will always invoke undefined behavior. Therefore, this case must be impossible and the number does not need to be printed. The entire if statement could be optimized out. Is this kind of backwards reasoning allowed according to the standard?

推荐答案


是否存在这样的在给定程序中的语句意味着
整个程序是未定义的或者行为只有变得未定义
一旦控制流命中这个语句?

Does the existence of such a statement in a given program mean that the whole program is undefined or that behavior only becomes undefined once control flow hits this statement?

第一个条件太强,第二个太弱了。

Neither. The first condition is too strong and the second is too weak.

对象访问有时被排序,但标准描述了程序在时间之外的行为。 Danvil已经引用:

Object access are sometimes sequenced, but the standard describes the behavior of the program outside of time. Danvil already quoted:


如果任何此类执行包含未定义的操作,则此
国际标准不要求实施
使用该输入执行该程序(甚至不考虑第一个未定义操作之前的
操作)

if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)

解释:


如果程序的执行产生未定义的行为,整个程序就会有
未定义的行为。

If the execution of the program yields undefined behavior, then the whole program has undefined behavior.

所以,带UB的不可达语句不给出程序UB。一个可达的语句(由于输入的值)从未达到,不给出程序UB。这就是为什么你的第一个条件太强了。

So, an unreachable statement with UB doesn't give the program UB. A reachable statement that (because of the values of inputs) is never reached, doesn't give the program UB. That's why your first condition is too strong.

现在,编译器通常不能告诉什么是UB。因此,为了允许优化器重新排序具有潜在UB的语句,如果它们的行为被定义,它们将是可重新排序的,则有必要允许UB在前面的序列点之前到达时间并且错误(或在C ++ 11术语,用于UB影响在UB事物之前排序的事物)。因此,您的第二个条件太弱了。

Now, the compiler cannot in general tell what has UB. So to allow the optimizer to re-order statements with potential UB that would be re-orderable should their behavior be defined, it's necessary to permit UB to "reach back in time" and go wrong prior to the preceding sequence point (or in C++11 terminology, for the UB to affect things that are sequenced before the UB thing). Therefore your second condition is too weak.

这是一个主要的例子,当优化器依赖严格的别名。严格的混叠规则的整个要点是允许编译器重新排序无法有效重新排序的操作,如果有可能的话,指针是同一个内存的别名。所以如果你使用非法的别名指针,并且UB确实发生,那么它很容易影响一个语句之前的UB语句。就抽象机而言,UB语句尚未执行。就实际的目标代码而言,它已被部分或完全执行。但是标准并没有尝试详细说明优化器重新排序语句的含义,或者UB的含义。

A major example of this is when the optimizer relies on strict aliasing. The whole point of the strict aliasing rules is to allow the compiler to re-order operations that could not validly be re-ordered if it were possible that the pointers in question alias the same memory. So if you use illegally aliasing pointers, and UB does occur, then it can easily affect a statement "before" the UB statement. As far as the abstract machine is concerned the UB statement has not been executed yet. As far as the actual object code is concerned, it has been partly or fully executed. But the standard doesn't try to get into detail about what it means for the optimizer to re-order statements, or what the implications of that are for UB. It just gives the implementation license to go wrong as soon as it pleases.

您可以将此视为UB有时间机器。

You can think of this as, "UB has a time machine".

具体回答你的例子:


  • 如果读取3,行为只有未定义。 >
  • 如果基本块包含一个未定义的操作,编译器可以并且可以消除代码死亡。他们被允许(和我猜猜做)在不是一个基本块,但所有分支导致UB的情况下。此示例不是候选人,除非 PrintToConsole(3)以某种方式知道确保返回。
  • Behavior is only undefined if 3 is read.
  • Compilers can and do eliminate code as dead if a basic block contains an operation certain to be undefined. They're permitted (and I'm guessing do) in cases which aren't a basic block but where all branches lead to UB. This example isn't a candidate unless PrintToConsole(3) is somehow known to be sure to return. It could throw an exception or whatever.

第二个类似的示例是gcc选项 -fdelete-空指针检查,它可以采取这样的代码(我没有检查这个具体的例子,认为它说明了一般的想法):

A similar example to your second is the gcc option -fdelete-null-pointer-checks, which can take code like this (I haven't checked this specific example, consider it illustrative of the general idea):

void foo(int *p) {
    if (p) *p = 3;
    std::cout << *p << '\n';
}

并将其更改为:

*p = 3;
std::cout << "3\n";

为什么?因为如果 p 是null,那么代码也有UB,所以编译器可能会假定它不为null并相应地优化。 Linux内核崩溃了( https://web.nvd.nist.gov) / view / vuln / detail?vulnId = CVE-2009-1897 ),本质上是因为它的操作模式是解除引用一个空指针不是UB,在一个定义的硬件异常,内核可以处理。当启用优化时,gcc需要使用 -fno-delete-null-pointer-checks ,以提供超出标准的保证。

Why? Because if p is null then the code has UB anyway, so the compiler may assume it is not null and optimize accordingly. The linux kernel tripped over this (https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-1897) essentially because it operates in a mode where dereferencing a null pointer isn't supposed to be UB, it's expected to result in a defined hardware exception that the kernel can handle. When optimization is enabled, gcc requires the use of -fno-delete-null-pointer-checks in order to provide that beyond-standard guarantee.

PS问题的实际答案什么时候未定义的行为罢工?是10分钟前你打算离开的一天。

P.S. The practical answer to the question "when does undefined behavior strike?" is "10 minutes before you were planning to leave for the day".

这篇关于可以将具有未定义行为的分支假定为不可达,并作为死代码进行优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆