应该在64位x86中对指针比较进行签名还是不签名? [英] Should pointer comparisons be signed or unsigned in 64-bit x86?

查看:143
本文介绍了应该在64位x86中对指针比较进行签名还是不签名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在编写x86-64用户空间程序集并比较两个指针值时,应使用 signed 条件,例如jljge unsigned 条件,例如为jbjae?

直觉上,我认为指针是无符号的,在64位进程的情况下,指针从0到2 ^ 64-1,并且我认为该模型对于32位代码是准确的.我想这就是大多数人对它们的看法.

但是在64位代码中,我认为您永远无法有效地跨越0x7FFFFFFFFFFFFFFF(2 ^ 63-1)处的有符号不连续性,并且许多有趣的内存区域倾向于聚集在有符号0附近(对于代码和静态数据经常(有时取决于实现而有时是堆),并且接近于规范地址空间下半部分的最大地址(类似于当今大多数系统上的0x00007fffffffffff),用于某些位置的栈位置和某些实现上的堆 1 .

所以我不确定应该采用哪种方式对待它们: signed 具有的优势是,它在0附近是安全的,因为那里没有间断,而 unsigned 具有在2 ^ 63附近具有相同的优势,因为那里没有间断.但是实际上,您不会在2 ^ 63附近看到任何地址,因为当前商用硬件的虚拟地址空间限制为小于50位.那指向签名了吗?


1 ...,有时堆和其他映射区域不靠近地址空间的底部或顶部.

解决方案

TL:DR:intptr_t在某些情况下可能是最好的,因为带符号的溢出边界位于非规范孔"的中间.如果可以将值从零包装到0xFF...FF,反之亦然,则将值视为负值而不是庞大可能会更好,但是任何有效大小的指标+大小都不能将INT64_MAXINT64_MIN的值包装起来.

否则,您可能希望未签名的高半部分"(高位设置)与低半部分以上进行比较.


这完全取决于您对两个指针的了解!

您先前对问题的编辑给出了ptrA < ptrB - C作为您感兴趣的用例.使用ptrA < ptrB - sizeA进行重叠检查,或者使用current < endp - loop_stride进行展开的SIMD循环条件.评论中的讨论也与此有关.

因此,您实际上在做的是将ptrB - C形成为一个指针,该指针可能位于您感兴趣的对象之外,并且可能包裹了(未签名). (好观察这样的东西可能就是为什么C和C ++使其成为UB在对象外部形成指针的原因,但是它们确实允许在最后一页如果内核甚至允许您映射它.)无论如何,您都想使用一个已签名的比较,以便它仍然有效",而不必检查回卷或检查C或任何其他内容的符号.这仍然比大多数问题更为具体.

是的,对于从相同对象以合理大小派生的相关"指针,带符号的比较在当前硬件上是安全的,并且只有在硬件支持完整的64位虚拟机的不太可能/遥远的机器上才能中断如果两个指针都在规范范围的下半部,那么使用unsigned进行重叠检查也是安全的,我认为所有主流x86-64 OS上的用户空间地址都是这种情况.


如您所指出的,如果ptrB - C换行(无符号换行),则未签名的ptrA < ptrB - C可能会失败".实际上,对于比C的大小更接近0的静态地址,会发生这种情况.

通常较低的64kiB是不可映射的(例如,在Linux上,大多数发行版都附带sysctl vm.mmap_min_addr = 65536或至少4096.),但是即使在分页加载时也可以正常工作,但是您永远不会在正常"系统中看到它,因为地址通常被视为未签名.)


那么什么时候签名比较会失败? ptrB - C在溢出时签名环绕了.或者,如果您曾经有指向上半部分对象的指针(例如,进入Linux的vDSO页面),则上半部分和下半部分地址之间的比较可能会给您带来意想不到的结果:您将看到上半部分"地址少于低半"地址.即使ptrB - C计算未完成,也会发生这种情况.

(我们只在直接讨论asm,而不是C,所以没有UB,我只是在sublea/cmp/jl中使用C表示法.)

仅在0x7FFF...0x8000...之间的边界附近发生带标志的环绕. 但该边界距离任何规范地址都非常遥远.我将从为什么在64位中,虚拟地址比物理地址(52位长)短4位(48位长)?.

请记住,非规范地址上的x86-64错误.这意味着它将检查48位虚拟地址是否正确地符号扩展为64位,即位[63:48]与位47匹配(从0开始).

+----------+
| 2^64-1   |   0xffffffffffffffff
| ...      |                       high half of canonical address range
| 2^64-2^47|   0xffff800000000000
+----------+
|          |
| unusable |   Not to scale: this is 2^15 times larger than the top/bottom ranges.
|          |
+----------+
| 2^47-1   |   0x00007fffffffffff
| ...      |                       low half of canonical range
| 0        |   0x0000000000000000
+----------+

Intel已提出了5级页表扩展用于57位虚拟地址(即表的另一9位级别),但是仍然使大多数地址空间不规范.也就是说,任何规范地址仍将与签名的环绕式广告保持2 ^ 63-2 ^ 57的距离.

根据操作系统的不同,您的所有地址都可能在下半部或上半部.例如在x86-64 Linux上,高(负")地址是内核地址,而低(有符号正)地址是用户空间.但请注意, Agner Fog的 Microarch pdf.

When writing x86-64 user-space assembly and comparing two pointer values, should we use signed conditions such as jl and jge or unsigned conditions such as jb and jae?

Intuitively I think of pointers as unsigned, running from 0 to 2^64-1 in the case of a 64-bit process, and I think this model is accurate for 32-bit code. I guess that's how most people think about them.

In 64-bit code however I don't think you can ever validly cross over the signed discontinuity at 0x7FFFFFFFFFFFFFFF (2^63 - 1), and many interesting memory regions tend to clustered near signed 0 (for code and static data often, and sometimes heap depending on the implementation), and near the maximum address on the lower half of the canonical address space (something like of 0x00007fffffffffff on most systems today) for stack locations and the heap on some implementations1.

So I'm not sure which way they should be treated: signed has the advantage that it is safe around 0 since there is no discontinuity there, and unsigned has the same advantage near 2^63 since there is no discontinuity there. However in practice you don't see any addresses anywhere close to 2^63 since the virtual address space of current commodity hardware is limited to less than 50 bits. Does that point towards signed?


1 ... and sometimes the heap and other mapped regions are not close to either the bottom or top of the address space.

解决方案

TL:DR: intptr_t might be best in some cases because the signed-overflow boundary is in the middle of the "non-canonical hole". Treating a value as negative instead of huge may be better if wrapping from zero to 0xFF...FF or vice versa is possible, but pointer+size for any valid size can't wrap a value from INT64_MAX to INT64_MIN.

Otherwise you probably want unsigned for the "high half" (high bit set) to compare as above the low half.


It depends exactly what you want to know about two pointers!

A previous edit of your question gave ptrA < ptrB - C as the use-case you're interested in. e.g. an overlap check with ptrA < ptrB - sizeA, or maybe an unrolled SIMD loop condition with current < endp - loop_stride. Discussion in comments has been about this kind of thing, too.

So what you're really doing is forming ptrB - C as a pointer that's potentially outside the object you were interested in, and which may have wrapped around (unsigned). (Good observation that stuff like this may be why C and C++ make it UB to form pointers outside of objects, but they do allow one-past-the-end which has unsigned wrapping at the end of the highest page, if the kernel even lets you map it.) Anyway, you want to use a signed comparison so it "still works" without having to check for wraparound, or check the sign of C or any of that stuff. This is still a lot more specific than most of the question.

Yes, for "related" pointers derived from the same object with reasonable sizes, signed compare is safe on current hardware, and could only break on unlikely / distant-future machines with hardware support for full 64-bit virtual addresses. Overlap checks are also safe with unsigned if both pointers are in the low half of the canonical range, which I think is the case for user-space addresses on all the mainstream x86-64 OSes.


As you point out, unsigned ptrA < ptrB - C can "fail" if ptrB - C wraps (unsigned wraparound). This can happen in practice for static addresses that are closer to 0 than the size of C.

Usually the low 64kiB is not mapable (e.g. on Linux, most distros ship with the sysctl vm.mmap_min_addr = 65536, or at least 4096. But some systems have it =0 for WINE). Still, I think it's normal for kernels to not give you a zero page unless you request that address specifically, because it stops NULL deref from faulting (which is normally highly desirable for security and debugability reasons).

This means the loop_stride case is usually not a problem. The sizeA version can usually be done with ptrA + sizeA < ptrB, and as a bonus you can use LEA to add instead of copy + subtract. ptrA+sizeA is guaranteed not to wrap unless you have objects that wrap their pointer from 2^64-1 to zero (which works even with a page-split load at the wraparound, but you'll never see it in a "normal" system because addresses are normally treated as unsigned.)


So when can it fail with a signed compare? When ptrB - C has signed wraparound on overflow. Or if you ever have pointers to high-half objects (e.g. into Linux's vDSO pages), a compare between a high-half and low-half address might give you an unexpected result: you will see "high-half" addresses as less than "low-half" addresses. This happens even though the ptrB - C calculation doesn't wrap.

(We're only talking about asm directly, not C, so there's no UB, I'm just using C notation for sub or lea / cmp / jl.)

Signed wraparound can only happen near the boundary between 0x7FFF... and 0x8000.... But that boundary is extremely far from any canonical address. I'll reproduce a diagram of x86-64 address space (for current implementations where virtual address are 48 bits) from another answer. See also Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)?.

Remember, x86-64 faults on non-canonical addresses. That means it checks that 48-bit virtual address are properly sign-extended to 64 bits, i.e. that bits [63:48] match bit 47 (numbering from 0).

+----------+
| 2^64-1   |   0xffffffffffffffff
| ...      |                       high half of canonical address range
| 2^64-2^47|   0xffff800000000000
+----------+
|          |
| unusable |   Not to scale: this is 2^15 times larger than the top/bottom ranges.
|          |
+----------+
| 2^47-1   |   0x00007fffffffffff
| ...      |                       low half of canonical range
| 0        |   0x0000000000000000
+----------+

Intel has proposed a 5-level page-table extension for 57-bit virtual addresses (i.e. another 9-bit level of tables), but that still leaves most of the address space non-canonical. i.e. any canonical address would still be 2^63 - 2^57 away from signed wraparound.

Depending on the OS, all your addresses might be in the low half or the high half. e.g. on x86-64 Linux, high ("negative") addresses are kernel addresses, while low (signed positive) addresses are user-space. But note that Linux maps the kernel vDSO / vsyscall pages into user space very near the top of virtual address space. (But it leaves pages unmapped at the top, e.g. ffffffffff600000-ffffffffff601000 [vsyscall] in a 64-bit process on my desktop, but the vDSO pages are near the top of the bottom-half canonical range, 0x00007fff.... Even in a 32-bit process where in theory the whole 4GiB is usable by user-space, the vDSO is a page below the highest page, and mmap(MAP_FIXED) didn't work on that highest page. Perhaps because C allows one-past-the-end pointers?)

If you ever take the address of a function or variable in the vsyscall page, you can have a mix of positive and negative addresses. (I don't think anyone ever does that, but it's possible.)

So signed address comparison could be dangerous if you don't have a kernel/user split separating signed positive from signed negative, and your code is running in the distant future when/if x86-64 has been extended to full 64-bit virtual addresses, so an object can span the boundary. The latter seems unlikely, and if you can get a speedup from assuming it won't happen, it's probably a good idea.

This means signed-compare already is dangerous with 32-bit pointers, because 64-bit kernels leave the whole 4GiB usable by user-space. (And 32-bit kernels can be configured with a 3:1 kernel/user split). There's no unusable canonical range. In 32-bit mode, an object can span the signed-wraparound boundary. (Or in the ILP32 x32 ABI: 32-bit pointers in long mode.)


Performance advantages:

Unlike 32-bit mode, there are no CPU where jge is faster than jae in 64-bit mode, or other combo. (And different conditions for setcc / cmovcc never matter). So any perf diff is only from surrounding code, unless you can do something clever with adc or sbb instead of a cmov or setcc.

Sandybridge-family can macro-fuse test / cmp (and sub, add, and various other non-read-only instructions) with signed or unsigned compares (not all JCC, but this isn't a factor). Bulldozer-family can fuse cmp / test with any JCC.

Core2 can only macro-fuse cmp with unsigned compares, not signed, but Core2 can't macro-fuse at all in 64-bit mode. (It can macro-fuse test with signed-compares in 32-bit mode, BTW.)

Nehalem can macro-fuse test or cmp with signed or unsigned compares (including in 64-bit mode).

Source: Agner Fog's microarch pdf.

这篇关于应该在64位x86中对指针比较进行签名还是不签名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆