在gcc linux x86-64 C ++中,(p + x)-x是否始终会导致p指向指针p和整数x [英] Does (p+x)-x always result in p for pointer p and integer x in gcc linux x86-64 C++
问题描述
假设我们有:
char* p;
int x;
最近在另一个问题中讨论过的 ,对无效指针的算术包括比较操作可能会产生意外情况gcc linux x86-64 C ++中的行为.这个新问题专门针对表达式(p+x)-x
:在x86-64 linux上运行的任何现有GCC版本中,它都能生成意外行为(即结果不是p
)吗?
As recently discussed in another question, arithmetic including comparison operations on invalid pointers can generate unexpected behavior in gcc linux x86-64 C++. This new question is specifically about the expression (p+x)-x
: can it generate unexpected behavior (i.e., result not beingp
) in any existing GCC version running on x86-64 linux?
请注意,此问题仅与指针算术有关.完全没有意图访问由*(p+x)
指定的位置,这显然通常是不可预测的.
Note that this question is just about pointer arithmetic; there is absolutely no intention to access the location designated by *(p+x)
, which obviously would be unpredictable in general.
这里的实际兴趣是基于非零的数组.请注意,在这些应用程序中,(p+x)
和x
的减法发生在代码的不同位置.
The practical interest here is non-zero-based arrays. Note that (p+x)
and the subtraction by x
happen in different places in the code in these applications.
如果可以显示x86-64上的最新GCC版本从未为(p+x)-x
生成意外行为,则可以对这些版本进行非零基数组认证,并且可以修改或配置生成意外行为的将来版本以支持该认证.
If recent GCC versions on x86-64 can be shown to never generate unexpected behavior for (p+x)-x
then these versions can be certified for non-zero-based arrays, and future versions generating unexpected behavior could be modified or configured to support this certification.
更新
对于上述实际情况,我们还可以假设p
本身是有效的指针,而p != NULL
.
For the practical case described above, we could also assume p
itself is a valid pointer and p != NULL
.
推荐答案
是的,对于gcc5.x及更高版本,即使禁用了优化,该特定表达式也很早就被优化为p
,而不考虑任何可能的运行时UB.
Yes, for gcc5.x and later specifically, that specific expression is optimized very early to just p
, even with optimization disabled, regardless of any possible runtime UB.
即使使用静态数组和编译时常量大小,也会发生这种情况. gcc -fsanitize=undefined
也不插入任何工具来查找它.在-Wall -Wextra -Wpedantic
This happens even with a static array and compile-time constant size. gcc -fsanitize=undefined
doesn't insert any instrumentation to look for it either. Also no warnings at -Wall -Wextra -Wpedantic
int *add(int *p, long long x) {
return (p+x) - x;
}
int *visible_UB(void) {
static int arr[100];
return (arr+200) - 200;
}
使用gcc -dump-tree-original
在任何优化通过之前转储其内部程序逻辑表示,表明该优化甚至发生在gcc5.x和更高版本中的之前. (甚至在-O0
处也会发生.)
Using gcc -dump-tree-original
to dump its internal representation of program logic before any optimization passes shows that this optimization happened even before that in gcc5.x and newer. (And happens even at -O0
).
;; Function int* add(int*, long long int) (null)
;; enabled by -tree-original
return <retval> = p;
;; Function int* visible_UB() (null)
;; enabled by -tree-original
{
static int arr[100];
static int arr[100];
return <retval> = (int *) &arr;
}
That's from the Godbolt compiler explorer with gcc8.3 with -O0
.
x86-64 asm输出仅为:
The x86-64 asm output is just:
; g++8.3 -O0
add(int*, long long):
mov QWORD PTR [rsp-8], rdi
mov QWORD PTR [rsp-16], rsi # spill args
mov rax, QWORD PTR [rsp-8] # reload only the pointer
ret
visible_UB():
mov eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
ret
-O3
输出当然只是mov rax, rdi
-O3
output is of course just mov rax, rdi
gcc4.9和更早版本仅在以后的过程中执行此优化,而不是在-O0
处执行:树转储仍然包括减法,而x86-64 asm是
gcc4.9 and earlier only do this optimization in a later pass, and not at -O0
: the tree dump still includes the subtract, and the x86-64 asm is
# g++4.9.4 -O0
add(int*, long long):
mov QWORD PTR [rsp-8], rdi
mov QWORD PTR [rsp-16], rsi
mov rax, QWORD PTR [rsp-16]
lea rdx, [0+rax*4] # RDX = x*4 = x*sizeof(int)
mov rax, QWORD PTR [rsp-16]
sal rax, 2
neg rax # RAX = -(x*4)
add rdx, rax # RDX = x*4 + (-(x*4)) = 0
mov rax, QWORD PTR [rsp-8]
add rax, rdx # p += x + (-x)
ret
visible_UB(): # but constants still optimize away at -O0
mov eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
ret
这确实与-fdump-tree-original
输出一致:
return <retval> = p + ((sizetype) ((long unsigned int) x * 4) + -(sizetype) ((long unsigned int) x * 4));
如果x*4
溢出,您仍将获得正确的答案.在实践中,我想不出一种方法来编写导致UB导致行为可观察的变化的函数.
If x*4
overflows, you'll still get the right answer. In practice I can't think of a way to write a function that would lead to the UB causing an observable change in behaviour.
作为较大函数的一部分,将允许编译器推断范围信息,例如p[x]
与p[0]
是同一对象的一部分,因此在/之间读取内存.允许那么远,不会出现段错误.例如允许对搜索循环进行自动向量化.
As part of a larger function, a compiler would be allowed to infer some range info, like that p[x]
is part of the same object as p[0]
, so reading memory in between / out that far is allowed and won't segfault. e.g. allowing auto-vectorization of a search loop.
但是我怀疑gcc是否会寻找它,更不用说利用它了.
But I doubt that gcc even looks for that, let alone takes advantage of it.
(请注意,您的问题标题特定于Linux上针对x86-64的gcc,例如,如果在单独的语句中完成,则不是关于gcc中类似的东西是否安全.我的意思是,在练习,但几乎不会在解析后立即进行优化.而且,一般来说,绝对不是关于C ++的.)
(Note that your question title was specific to gcc targeting x86-64 on Linux, not about whether similar things are safe in gcc, e.g. if done in separate statements. I mean yes probably safe in practice, but won't be optimized away almost immediately after parsing. And definitely not about C++ in general.)
我强烈建议不这样做.使用uintptr_t
保留不是实际有效指针的类似指针的值.就像您在更新
I highly recommend not doing this. Use uintptr_t
to hold pointer-like values that aren't actual valid pointers. like you're doing in the updates to your answer on C++ gcc extension for non-zero-based array pointer allocation?.
这篇关于在gcc linux x86-64 C ++中,(p + x)-x是否始终会导致p指向指针p和整数x的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!