在C / C ++假设返回的所有功能? [英] Are all functions in C/C++ assumed to return?

查看:100
本文介绍了在C / C ++假设返回的所有功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在读未定义行为和范例之一的优化看上去本文高度可疑:


 如果(ARG2 == 0)
    在ereport(ERROR,(ERR code(ERR code_DIVISION_BY_ZERO)
                    ERRMSG(被零除)));
/ *无溢出可能* /
PG_RETURN_INT32((INT32)ARG1 / ARG2);


  
  

图2 :一个意外的优化作废师被零
  检查,在文件src / backend / utils的/ ADT /的PostgreSQL int8.c 。要将呼叫
  的ereport(ERROR,:::)将引发异常。


从本质上讲,编译器的假设的是的ereport 将返回,并删除 ARG2 == 0 检查,因为该部门的presence意味着一个非零的分母,即 ARG2!= 0

这是一个有效的优化?是免费的编译器假设一个函数总是返回?

编辑:整个事情取决于 的ereport ,该被如此描述:

  84 / * ----------
   85 *新式错误报告API:以这种方式使用:
   86 *的ereport(ERROR,
   87 *(ERR code(ERR code_UNDEFINED_CURSOR)
   88 * ERRMSG(门户\\%s \\中找不到,stmt-> portalname)
   89 * ...其他errxxx()字段需要...));
   90 *
   91 *为必填项,错误级别,所以是一个主要的错误消息(ERRMSG
   92 *或errmsg_internal)。其它的一切都是可选的。 ERR code()默认为
   93 * ERR code_INTERNAL_ERROR如果elevel是错误或更多,ERR code_WARNING
   94 *如果elevel是警告,或ERR code_SUCCESSFUL_COMPLETION如果elevel是
   95 *注意以下。
   96 *
   97 * ereport_domain()允许指定的消息域,模块
   98 *希望使用不同的消息目录从后端的。为了避免
   99 *每个.o文件将默认的文本域的一个拷贝,我们把它定义为NULL这里
  100 *并有errstart插入默认文本域。模块既可以使用
  101 * ereport_domain()直接或preferably他们可以覆盖TEXTDOMAIN
  102 *宏。
  103 *
  104 *如果elevel> =错误,呼叫将不会返回;我们试图告诉编译器
  通过pg_unreachable 105 *的是()。然而,没有有用的优化效果是
  106 *获得除非编译器看到elevel作为一个编译时间常数,否则
  107 *我们只是增加code膨胀。因此,如果__builtin_constant_p是可用的,
  108 *使用,为了使第二如果()以完全非恒定消失
  109 *案例。我们避免使用一个局部变量,因为它是没有必要的,
  110 *从optlevel -O0使不可达扣除prevents GCC。
  111 * ----------


解决方案

  

时的免费编译器假设一个函数总是返回?


这是不是在C法律或C ++的编译器,以在此基础上进行优化,除非它在某种程度上明确知道的ereport 收益(例如,通过内联,并检查code)。

的ereport 取决于至少有一个的#define 和传递的价值观,所以我不能肯定,但它肯定看起来是的设计的有条件不会返回(它调用的外部函数 errstart 的是,尽可能的编译器知道,可能会或可能不会返回)。因此,如果真的编译器是假设它总是返回那么无论是编译器是错误的,或者实施的ereport 是错误的,或者我已经完全误会了吧。

白皮书说,


  

但是,程序员没有通知编译器调用
  在ereport(ERROR,:::)没有返回。


我不相信程序员有任何这样的义务,除非也许有实际上一些非标准扩展编译这个特殊的code,它启用的记录打破在一定的有效code优化时,条件。

不幸的是相当困难的证明code转型是通过引用标准的不正确的,因为我无法引用任何东西表明,没有,藏在某处700-900页,一小条上面写着:哦,顺便说一句,所有功能都必须返回。我没有真正阅读的标准的每一行,但这样一项条款是荒谬的:功能需要允许调用中止()退出()的longjmp()。在C ++中他们也可以抛出异常。他们需要允许有条件做到这一点 - 属性不返回的意味着该函数的从不的回报,而不是它可能不回来,它的缺席证明一无所知,那么函数返回与否。我的两个标准的经验是,他们不是(也)是荒谬的。

优化不允许突破有效的方案,他们由作为假设规则,即观察到的行为是preserved限制。如果的ereport 不回那么优化改变程序的可观察行为(从做什么的ereport 做而不是按回零,具有不确定的行为,由于司)。因此,它是被禁止的。

有关于这个具体问题在这里的详细信息:

<一个href=\"http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616180\">http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616180

它提到了一个GCC bug报告<一href=\"http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29968\">http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29968这是(正确IMO)拒绝,但如果的ereport 没有然后返回PostgreSQL的问题是不一样的拒绝GCC bug报告。

在Debian错误的描述如下:


  

gcc的人都充满了它。这是有关这里的问题是C
  标准的定义的序列点,并且特别是
  的要求,即之后的一项声明中明显的副作用,不能
  较早函数调用的执行之前发生。最后
  一次我缠着他们此,我得到了一些蹩脚的声称一个SIGFPE
  不是规范的定义中的副作用。在这
  点有益的探讨停了下来,因为这是不可能进行谈判
  与某人声称谁是心甘情愿的。


在事实上,如果以后的语句UB那么它的明确地的整个程序有UB的标准规定。本在他的回答中引用。这是不是这样的(因为这个人似乎认为)所有可见的副作用必须发生到UB前的最后一个序列点。 UB许可发明了时间机器(以及更通俗点,它允许乱序执行假定执行已定义的行为的一切)。 gcc的人是不是满的,如果这对他们说。

一个SIGFPE将是一个明显的副作用,如果编译器选择,以保证和文档(作为一个可扩展标准),它的发生,但如果它是UB的只是结果,那么它是不是。比较例如 -fwrapv 选项,GCC,这改变了从UB整数溢出(标准说的话),以环绕式(编译器担保的仅在您指定的选项的)。在MIPS,GCC有一个选项 -mcheck零师,它看起来像它由零上定义师的行为,但我从来没有使用过它。

这有可能是该论文的作者注意到对GCC的投诉的不正当性,并认为PostgreSQL的作者之一是错误的这种方式影响了他们,当他们把暗笑报价为:


  

我们发现有类似的问题在PostgreSQL里,这被记为GCC
  源$ C ​​$ C注释错误


但功能没有返回从后一些副作用返回功能非常不同的。如果它不返回,这将有UB声明的不执行的的C(或C ++)的定义抽象机在标准范围内。未得语句不执行:我希望这不是争议。因此,如果GCC家伙们声称,从UB未得语句使整个程序不确定,的然后的他们会是充满了它。我不知道他们声称,在Debian的报告结尾有一个建议,这个问题可能是由GCC 4.4已经消失。如果是这样的话,或许PostgreSQL的确实遇到了一个最终确认的漏洞,而不是(如纸张链接到笔者认为),一个有效的优化或(如谁说,海湾合作委员会的家伙都充满了它的人认为)一misinter $ P海湾合作委员会的作者的标准$ ptation。

I was reading this paper on undefined behaviour and one of the example "optimisations" looks highly dubious:

if (arg2 == 0)
    ereport(ERROR, (errcode(ERRCODE_DIVISION_BY_ZERO),
                    errmsg("division by zero")));
/* No overflow is possible */
PG_RETURN_INT32((int32) arg1 / arg2);

Figure 2: An unexpected optimization voids the division-by-zero check, in src/backend/utils/adt/int8.c of PostgreSQL. The call to ereport(ERROR, :::) will raise an exception.

Essentially, the compiler assumes that ereport will return, and removes the arg2 == 0 check since the presence of the division implies a non-zero denominator, i.e. arg2 != 0.

Is this a valid optimisation? Is the compiler free to assume that a function will always return?

EDIT: The whole thing depends on ereport, which is described thus:

   84 /*----------
   85  * New-style error reporting API: to be used in this way:
   86  *      ereport(ERROR,
   87  *              (errcode(ERRCODE_UNDEFINED_CURSOR),
   88  *               errmsg("portal \"%s\" not found", stmt->portalname),
   89  *               ... other errxxx() fields as needed ...));
   90  *
   91  * The error level is required, and so is a primary error message (errmsg
   92  * or errmsg_internal).  All else is optional.  errcode() defaults to
   93  * ERRCODE_INTERNAL_ERROR if elevel is ERROR or more, ERRCODE_WARNING
   94  * if elevel is WARNING, or ERRCODE_SUCCESSFUL_COMPLETION if elevel is
   95  * NOTICE or below.
   96  *
   97  * ereport_domain() allows a message domain to be specified, for modules that
   98  * wish to use a different message catalog from the backend's.  To avoid having
   99  * one copy of the default text domain per .o file, we define it as NULL here
  100  * and have errstart insert the default text domain.  Modules can either use
  101  * ereport_domain() directly, or preferably they can override the TEXTDOMAIN
  102  * macro.
  103  *
  104  * If elevel >= ERROR, the call will not return; we try to inform the compiler
  105  * of that via pg_unreachable().  However, no useful optimization effect is
  106  * obtained unless the compiler sees elevel as a compile-time constant, else
  107  * we're just adding code bloat.  So, if __builtin_constant_p is available,
  108  * use that to cause the second if() to vanish completely for non-constant
  109  * cases.  We avoid using a local variable because it's not necessary and
  110  * prevents gcc from making the unreachability deduction at optlevel -O0.
  111  *----------

解决方案

Is the compiler free to assume that a function will always return?

It is not legal in C or C++ for a compiler to optimize on that basis, unless it somehow specifically knows that ereport returns (for example by inlining it and inspecting the code).

ereport depends on at least one #define and on the values passed in, so I can't be sure, but it certainly looks to be designed to conditionally not return (and it calls an extern function errstart that, as far as the compiler knows, may or may not return). So if the compiler really is assuming that it always returns then either the compiler is wrong, or the implementation of ereport is wrong, or I've completely misunderstood it.

The paper says,

However, the programmer failed to inform the compiler that the call to ereport(ERROR, ::: ) does not return.

I don't believe that the programmer has any such obligation, unless perhaps there's some non-standard extension in effect when compiling this particular code, that enables an optimization that's documented to break valid code under certain conditions.

Unfortunately it is rather difficult to prove the code transformation is incorrect by citing the standard, since I can't quote anything to show that there isn't, tucked away somewhere in pages 700-900, a little clause that says "oh, by the way, all functions must return". I haven't actually read every line of the standard, but such a clause would be absurd: functions need to be allowed to call abort() or exit() or longjmp(). In C++ they can also throw exceptions. And they need to be allowed to do this conditionally -- the attribute noreturn means that the function never returns, not that it might not return, and its absence proves nothing about whether the function returns or not. My experience of both standards is that they aren't (that) absurd.

Optimizations are not allowed to break valid programs, they're constrained by the "as-if" rule that observable behaviour is preserved. If ereport doesn't return then the "optimization" changes the observable behaviour of the program (from doing whatever ereport does instead of returning, to having undefined behaviour due to the division by zero). Hence it is forbidden.

There's more information on this particular issue here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616180

It mentions a GCC bug report http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29968 that was (rightly IMO) rejected, but if ereport doesn't return then the PostGreSQL issue is not the same as that rejected GCC bug report.

In the debian bug description is the following:

The gcc guys are full of it. The issue that is relevant here is the C standard's definition of sequence points, and in particular the requirement that visible side effects of a later statement cannot happen before the execution of an earlier function call. The last time I pestered them about this, I got some lame claim that a SIGFPE wasn't a side effect within the definitions of the spec. At that point useful discussion stopped, because it's impossible to negotiate with someone who's willing to claim that.

In point of fact, if a later statement has UB then it is explicitly stated in the standard that the whole program has UB. Ben has the citation in his answer. It is not the case (as this person seems to think) that all visible side effects must occur up to the last sequence point before the UB. UB permits inventing a time machine (and more prosaically, it permits out of order execution that assumes everything executed has defined behaviour). The gcc guys are not full of it if that's all they say.

A SIGFPE would be a visible side effect if the compiler chooses to guarantee and document (as an extension to the standard) that it occurs, but if it's just the result of UB then it is not. Compare for example the -fwrapv option to GCC, which changes integer overflow from UB (what the standard says) to wrap-around (which the compiler guarantees, only if you specify the option). On MIPS, gcc has an option -mcheck-zero-division, which looks like it does define behaviour on division by zero, but I've never used it.

It's possible that the authors of the paper noticed the wrongness of that complaint against GCC, and the thought that one of the PostGreSQL authors was wrong in this way influenced them when they put the snigger quotes into:

We found seven similar issues in PostgreSQL, which were noted as "GCC bugs" in source code comments

But a function not returning is very different from a function returning after some side effects. If it doesn't return, the statement that would have UB is not executed within the definition of the C (or C++) abstract machine in the standard. Unreached statements aren't executed: I hope this isn't contentious. So if the "gcc guys" were to claim that UB from unreached statements renders the whole program undefined, then they'd be full of it. I don't know that they have claimed that, and at the end of the Debian report there's a suggestion that the issue might have gone away by GCC 4.4. If so then perhaps PostGreSQL indeed had encountered an eventually-acknowledged bug, not (as the author of the paper you link to thinks) a valid optimization or (as the person who says the gcc guys are full of it thinks) a misinterpretation of the standard by GCC's authors.

这篇关于在C / C ++假设返回的所有功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆