在gcc linux x86-64 C ++中,(p + x)-x是否始终会导致p指向指针p和整数x [英] Does (p+x)-x always result in p for pointer p and integer x in gcc linux x86-64 C++

查看:114
本文介绍了在gcc linux x86-64 C ++中,(p + x)-x是否始终会导致p指向指针p和整数x的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有:

char* p;
int   x;

最近在另一个问题中讨论过的 ,对无效指针的算术包括比较操作可能会产生意外情况gcc linux x86-64 C ++中的行为.这个新问题专门针对表达式(p+x)-x:在x86-64 linux上运行的任何现有GCC版本中,它都能生成意外行为(即结果不是p)吗?

As recently discussed in another question, arithmetic including comparison operations on invalid pointers can generate unexpected behavior in gcc linux x86-64 C++. This new question is specifically about the expression (p+x)-x: can it generate unexpected behavior (i.e., result not beingp) in any existing GCC version running on x86-64 linux?

请注意,此问题仅与指针算术有关.完全没有意图访问*(p+x)指定的位置,这显然通常是不可预测的.

Note that this question is just about pointer arithmetic; there is absolutely no intention to access the location designated by *(p+x), which obviously would be unpredictable in general.

这里的实际兴趣是基于非零的数组.请注意,在这些应用程序中,(p+x)x的减法发生在代码的不同位置.

The practical interest here is non-zero-based arrays. Note that (p+x) and the subtraction by x happen in different places in the code in these applications.

如果可以显示x86-64上的最新GCC版本从未为(p+x)-x生成意外行为,则可以对这些版本进行非零基数组认证,并且可以修改或配置生成意外行为的将来版本以支持该认证.

If recent GCC versions on x86-64 can be shown to never generate unexpected behavior for (p+x)-x then these versions can be certified for non-zero-based arrays, and future versions generating unexpected behavior could be modified or configured to support this certification.

更新

对于上述实际情况,我们还可以假设p本身是有效的指针,而p != NULL.

For the practical case described above, we could also assume p itself is a valid pointer and p != NULL.

推荐答案

是的,对于gcc5.x及更高版本,即使禁用了优化,该特定表达式也很早就被优化为p,而不考虑任何可能的运行时UB.

Yes, for gcc5.x and later specifically, that specific expression is optimized very early to just p, even with optimization disabled, regardless of any possible runtime UB.

即使使用静态数组和编译时常量大小,也会发生这种情况. gcc -fsanitize=undefined也不插入任何工具来查找它.在-Wall -Wextra -Wpedantic

This happens even with a static array and compile-time constant size. gcc -fsanitize=undefined doesn't insert any instrumentation to look for it either. Also no warnings at -Wall -Wextra -Wpedantic

int *add(int *p, long long x) {
    return (p+x) - x;
}

int *visible_UB(void) {
    static int arr[100];
    return (arr+200) - 200;
}

使用gcc -dump-tree-original在任何优化通过之前转储其内部程序逻辑表示,表明该优化甚至发生在gcc5.x和更高版本中的之前. (甚至在-O0处也会发生.)

Using gcc -dump-tree-original to dump its internal representation of program logic before any optimization passes shows that this optimization happened even before that in gcc5.x and newer. (And happens even at -O0).

;; Function int* add(int*, long long int) (null)
;; enabled by -tree-original

return <retval> = p;


;; Function int* visible_UB() (null)
;; enabled by -tree-original
{
  static int arr[100];

    static int arr[100];
  return <retval> = (int *) &arr;
}

那是

That's from the Godbolt compiler explorer with gcc8.3 with -O0.

x86-64 asm输出仅为:

The x86-64 asm output is just:

; g++8.3 -O0 
add(int*, long long):
    mov     QWORD PTR [rsp-8], rdi
    mov     QWORD PTR [rsp-16], rsi    # spill args
    mov     rax, QWORD PTR [rsp-8]     # reload only the pointer
    ret
visible_UB():
    mov     eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
    ret

-O3输出当然只是mov rax, rdi

-O3 output is of course just mov rax, rdi

gcc4.9和更早版本仅在以后的过程中执行此优化,而不是在-O0 处执行:树转储仍然包括减法,而x86-64 asm是

gcc4.9 and earlier only do this optimization in a later pass, and not at -O0: the tree dump still includes the subtract, and the x86-64 asm is

# g++4.9.4 -O0
add(int*, long long):
    mov     QWORD PTR [rsp-8], rdi
    mov     QWORD PTR [rsp-16], rsi
    mov     rax, QWORD PTR [rsp-16]
    lea     rdx, [0+rax*4]            # RDX = x*4 = x*sizeof(int)
    mov     rax, QWORD PTR [rsp-16]
    sal     rax, 2
    neg     rax                       # RAX = -(x*4)
    add     rdx, rax                  # RDX = x*4 + (-(x*4)) = 0
    mov     rax, QWORD PTR [rsp-8]
    add     rax, rdx                  # p += x + (-x)
    ret

visible_UB():       # but constants still optimize away at -O0
    mov     eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
    ret

这确实与-fdump-tree-original输出一致:

return <retval> = p + ((sizetype) ((long unsigned int) x * 4) + -(sizetype) ((long unsigned int) x * 4));

如果x*4溢出,您仍将获得正确的答案.在实践中,我想不出一种方法来编写导致UB导致行为可观察的变化的函数.

If x*4 overflows, you'll still get the right answer. In practice I can't think of a way to write a function that would lead to the UB causing an observable change in behaviour.

作为较大函数的一部分,将允许编译器推断范围信息,例如p[x]p[0] 是同一对象的一部分,因此在/之间读取内存.允许那么远,不会出现段错误.例如允许对搜索循环进行自动向量化.

As part of a larger function, a compiler would be allowed to infer some range info, like that p[x] is part of the same object as p[0], so reading memory in between / out that far is allowed and won't segfault. e.g. allowing auto-vectorization of a search loop.

但是我怀疑gcc是否会寻找它,更不用说利用它了.

But I doubt that gcc even looks for that, let alone takes advantage of it.

(请注意,您的问题标题特定于Linux上针对x86-64的gcc,例如,如果在单独的语句中完成,则不是关于gcc中类似的东西是否安全.我的意思是,在练习,但几乎不会在解析后立即进行优化.而且,一般来说,绝对不是关于C ++的.)

(Note that your question title was specific to gcc targeting x86-64 on Linux, not about whether similar things are safe in gcc, e.g. if done in separate statements. I mean yes probably safe in practice, but won't be optimized away almost immediately after parsing. And definitely not about C++ in general.)

我强烈建议这样做.使用uintptr_t保留不是实际有效指针的类似指针的值.就像您在更新

I highly recommend not doing this. Use uintptr_t to hold pointer-like values that aren't actual valid pointers. like you're doing in the updates to your answer on C++ gcc extension for non-zero-based array pointer allocation?.

这篇关于在gcc linux x86-64 C ++中,(p + x)-x是否始终会导致p指向指针p和整数x的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆