在 64 位指针中使用额外的 16 位 [英] Using the extra 16 bits in 64-bit pointers

查看:44
本文介绍了在 64 位指针中使用额外的 16 位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读到 64 位机器实际上只使用 48 位地址(具体来说,我使用的是 Intel核心 i7).

我希望额外的 16 位(位 48-63)与地址无关,并且会被忽略.但是当我尝试访问这样的地址时,我得到了一个信号 EXC_BAD_ACCESS.

我的代码是:

int *p1 = &val;int *p2 = (int *)((long)p1 | 1ll<<48);//设置第48位,应该是无关的int v = *p2;//这里我收到一个信号EXC_BAD_ACCESS.

为什么会这样?有没有办法使用这16位?

这可用于构建对缓存更友好的链表.代替下一个 ptr 使用 8 个字节,键使用 8 个字节(由于对齐限制),可以将键嵌入到指针中.

解决方案

保留高位,以防以后地址总线增加,不能就这么简单使用

<块引用>

AMD64 架构定义了 64 位虚拟地址格式,其中低 48 位用于当前实现中 (...) 架构定义 允许在未来的实现中将此限制提高到完整的 64 位,将虚拟地址空间扩展到 16 EB(264 字节).相比之下,x86 仅为 4 GB(232 字节).

http://en.wikipedia.org/wiki/X86-64#Architectural_features

更重要的是,根据同一篇文章[强调我的]:

<块引用>

... 在该体系结构的第一个实现中,实际上只有虚拟地址的最低有效 48 位会用于地址转换(页表查找).此外,任何虚拟地址的第 48 位到第 63 位必须是第 47 位 的副本(以类似于符号扩展的方式),否则处理器将引发异常.符合此规则的地址称为规范形式".

由于即使高位未使用,CPU 也会检查它们,因此它们并不是真正不相关".在使用指针之前,您需要确保地址是规范的.其他一些 64 位架构(如 ARM64)可以选择忽略高位,因此您可以更轻松地将数据存储在指针中.


也就是说,在 x86_64 中,如果需要,您仍然可以自由使用高 16 位(如果虚拟地址不超过 48 位,请参见下文),但您必须检查并在取消引用之前通过符号扩展来修复指针值.

请注意,将指针值转换为 long不正确的做法,因为 long 不能保证是宽的足以存储指针.您需要使用 uintptr_tintptr_t.

int *p1 = &val;//原始指针uint8_t 数据 = ...;const uintptr_t MASK = ~(1ULL << 48);//=== 将数据存入指针 ===//注意:为了安全起见,面向未来(因为未来的实现//可以增加指针中有效位的数量),我们应该//存储从最高位到低位的值int *p2 = (int *)(((uintptr_t)p1 & MASK) | (data <<56));//=== 获取存储在指针中的数据 ===数据 = (uintptr_t)p2>>56;//=== 尊重指针 ===//首先符号扩展使指针规范化//注意:从技术上讲,这是实现定义的.你可能想要更多//对值进行符号扩展的符合标准的方法intptr_t p3 = ((intptr_t)p2<<16)>>16;val = *(int*)p3;

WebKit 的 JavaScriptCore 和 Mozilla 的 SpiderMonkey 引擎以及 LuaJIT南拳技术.如果值为 NaN,则低 48 位将存储指向对象的指针,高 16 位用作标记位,否则为双精度值.

以前 Linux 也使用 63rdGS基地址的表示该值是否被内核写入

实际上,您通常也可以使用第 48th 位.因为大多数现代 64 位操作系统将内核空间和用户空间一分为二,因此第 47 位始终为零,并且您可以免费使用 17 个最高位


您还可以使用低位来存储数据.它被称为标记指针.如果 int 是 4 字节对齐的,那么低 2 位始终为 0,您可以像在 32 位架构中一样使用它们.对于 64 位值,您可以使用 3 个低位,因为它们已经是 8 字节对齐的.同样,您还需要在取消引用之前清除这些位.

int *p1 = &val;//我们想要存储值的指针整数标签 = 1;const uintptr_t MASK = ~0x03ULL;//=== 存储标签 ===int *p2 = (int *)(((uintptr_t)p1 & MASK) | tag);//=== 获取标签 ===标签 = (uintptr_t)p2 &0x03;//=== 获取引用的数据 ===//使用指针前清除2个标签位intptr_t p3 = (uintptr_t)p2 &面具;val = *(int*)p3;

其中一个著名的用户是采用 SMI(小整数)优化的 V8 引擎.地址中的最低位将用作类型的标记:

  • 如果为 1,则该值是指向实际数据(对象、浮点数或更大的整数)的指针.下一个较高位 (w) 表示指针是弱的还是强的.只需清除标记位并取消引用它
  • 如果是0,则是一个小整数.在带有指针压缩的 32 位 V8 或 64 位 V8 中,它是一个 31 位 int,进行有符号右移 1 以恢复该值;在没有指针压缩的 64 位 V8 中,它的上半部分是 32 位 int
<块引用>

 32 位 V8|----- 32 位 -----|指针:|_____地址_____w1|Smi:|___int31_value____0|64 位 V8|----- 32 位 -----|----- 32 位 -----|指针:|______地址__________w1|Smi:|____int32_value____|0000000000000000000|

https://v8.dev/blog/pointer-compression


正如下面评论的那样,英特尔发布了PML5,它提供了一个57 位虚拟地址空间,如果你在这样的系统上,你只能使用 7 个高位

您仍然可以使用一些变通办法来获得更多免费位.首先,您可以尝试在 64 位操作系统中使用 32 位指针.在 Linux 中,如果允许 x32abi,则指针只有 32 位长.在 Windows 中,只需清除 /LARGEADDRESSAWARE 标志,指针现在只有 32 位有效位,您可以将高 32 位用于您的目的.请参阅如何在 Windows 上检测 X32?.另一种方法是使用一些指针压缩技巧:V8 中的压缩指针实现与 JVM 的压缩 Oops 有何不同?

您可以通过请求操作系统仅在低区域分配内存来进一步获得更多位.例如,如果您可以确保您的应用程序永远不会使用超过 64MB 的内存,那么您只需要一个 26 位地址.如果所有分配都是 32 字节对齐的,那么您还有 5 位可以使用,这意味着您可以在指针中存储 64 - 21 = 43 位信息!

我想 ZGC 就是一个例子.它仅使用 42 位进行寻址,允许 242 字节 = 4 × 240 字节 = 4 TB

<块引用>

因此,ZGC 仅保留了 16TB 的地址空间(但实际上并未使用所有这些内存),从地址 4TB 开始.

ZGC 初见

它像这样使用指针中的位:

 6 4 4 4 4 4 03 7 6 5 2 1 0+-------------------+-+----+-----------------------------------------------+|00000000 00000000 0|0|1111|11 11111111 11111111 11111111 11111111 11111111|+-------------------+-+----+-----------------------------------------------+|||||||* 41-0 对象偏移(42 位,4TB 地址空间)|||||* 45-42 元数据位(4 位)0001 = 标记 0||0010 = 标记 1||0100 = 重新映射||1000 = 可完成|||* 46-46 未使用(1 位,始终为零)|* 63-47 固定(17 位,始终为零)

有关如何执行此操作的更多信息,请参阅


旁注: 对键值比指针小的情况使用链表是一种巨大的内存浪费,而且由于缓存位置不好,速度也会变慢.事实上,你不应该在大多数现实生活中的问题中使用链表

I read that a 64-bit machine actually uses only 48 bits of address (specifically, I'm using Intel core i7).

I would expect that the extra 16 bits (bits 48-63) are irrelevant for the address, and would be ignored. But when I try to access such an address I got a signal EXC_BAD_ACCESS.

My code is:

int *p1 = &val;
int *p2 = (int *)((long)p1 | 1ll<<48);//set bit 48, which should be irrelevant
int v = *p2; //Here I receive a signal EXC_BAD_ACCESS.

Why this is so? Is there a way to use these 16 bits?

This could be used to build more cache-friendly linked list. Instead of using 8 bytes for next ptr, and 8 bytes for key (due to alignment restriction), the key could be embedded into the pointer.

解决方案

The high order bits are reserved in case the address bus would be increased in the future, so you can't use it simply like that

The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations (...) The architecture definition allows this limit to be raised in future implementations to the full 64 bits, extending the virtual address space to 16 EB (264 bytes). This is compared to just 4 GB (232 bytes) for the x86.

http://en.wikipedia.org/wiki/X86-64#Architectural_features

More importantly, according to the same article [Emphasis mine]:

... in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup). Further, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form."

As the CPU will check the high bits even if they're unused, they're not really "irrelevant". You need to make sure that the address is canonical before using the pointer. Some other 64-bit architectures like ARM64 have the option to ignore the high bits, therefore you can store data in pointers much more easily.


That said, in x86_64 you're still free to use the high 16 bits if needed (if the virtual address is not wider than 48 bits, see below), but you have to check and fix the pointer value by sign-extending it before dereferencing.

Note that casting the pointer value to long is not the correct way to do because long is not guaranteed to be wide enough to store pointers. You need to use uintptr_t or intptr_t.

int *p1 = &val; // original pointer
uint8_t data = ...;
const uintptr_t MASK = ~(1ULL << 48);

// === Store data into the pointer ===
// Note: To be on the safe side and future-proof (because future implementations
//     can increase the number of significant bits in the pointer), we should
//     store values from the most significant bits down to the lower ones
int *p2 = (int *)(((uintptr_t)p1 & MASK) | (data << 56));

// === Get the data stored in the pointer ===
data = (uintptr_t)p2 >> 56;

// === Deference the pointer ===
// Sign extend first to make the pointer canonical
// Note: Technically this is implementation defined. You may want a more
//     standard-compliant way to sign-extend the value
intptr_t p3 = ((intptr_t)p2 << 16) >> 16;
val = *(int*)p3;

WebKit's JavaScriptCore and Mozilla's SpiderMonkey engine as well as LuaJIT use this in the nan-boxing technique. If the value is NaN, the low 48-bits will store the pointer to the object with the high 16 bits serve as tag bits, otherwise it's a double value.

Previously Linux also uses the 63rd bit of the GS base address to indicate whether the value was written by the kernel

In reality you can usually use the 48th bit, too. Because most modern 64-bit OSes split kernel and user space in half, so bit 47 is always zero and you have 17 top bits free for use


You can also use the lower bits to store data. It's called a tagged pointer. If int is 4-byte aligned then the 2 low bits are always 0 and you can use them like in 32-bit architectures. For 64-bit values you can use the 3 low bits because they're already 8-byte aligned. Again you also need to clear those bits before dereferencing.

int *p1 = &val; // the pointer we want to store the value into
int tag = 1;
const uintptr_t MASK = ~0x03ULL;

// === Store the tag ===
int *p2 = (int *)(((uintptr_t)p1 & MASK) | tag);

// === Get the tag ===
tag = (uintptr_t)p2 & 0x03;

// === Get the referenced data ===
// Clear the 2 tag bits before using the pointer
intptr_t p3 = (uintptr_t)p2 & MASK;
val = *(int*)p3;

One famous user of this is the V8 engine with SMI (small integer) optimization. The lowest bit in the address will serve as a tag for type:

  • if it's 1, the value is a pointer to the real data (objects, floats or bigger integers). The next higher bit (w) indicates that the pointer is weak or strong. Just clear the tag bits and dereference it
  • if it's 0, it's a small integer. In 32-bit V8 or 64-bit V8 with pointer compression it's a 31-bit int, do a signed right shift by 1 to restore the value; in 64-bit V8 without pointer compression it's a 32-bit int in the upper half

   32-bit V8
                           |----- 32 bits -----|
   Pointer:                |_____address_____w1|
   Smi:                    |___int31_value____0|
   
   64-bit V8
               |----- 32 bits -----|----- 32 bits -----|
   Pointer:    |________________address______________w1|
   Smi:        |____int32_value____|0000000000000000000|

https://v8.dev/blog/pointer-compression


So as commented below, Intel has published PML5 which provides a 57-bit virtual address space, if you're on such a system you can only use 7 high bits

You can still use some work around to get more free bits though. First you can try to use a 32-bit pointer in 64-bit OSes. In Linux if x32abi is allowed then pointers are only 32-bit long. In Windows just clear the /LARGEADDRESSAWARE flag and pointers now have only 32 significant bits and you can use the upper 32 bits for your purpose. See How to detect X32 on Windows?. Another way is to use some pointer compression tricks: How does the compressed pointer implementation in V8 differ from JVM's compressed Oops?

You can further get more bits by requesting the OS to allocate memory only in the low region. For example if you can ensure that your application never uses more than 64MB of memory then you need only a 26-bit address. And if all the allocations are 32-byte aligned then you have 5 more bits to use, which means you can store 64 - 21 = 43 bits of information in the pointer!

I guess ZGC is one example of this. It uses only 42 bits for addressing which allows for 242 bytes = 4 × 240 bytes = 4 TB

ZGC therefore just reserves 16TB of address space (but not actually uses all of this memory) starting at address 4TB.

A first look into ZGC

It uses the bits in the pointer like this:

 6                 4 4 4  4 4                                             0
 3                 7 6 5  2 1                                             0
+-------------------+-+----+-----------------------------------------------+
|00000000 00000000 0|0|1111|11 11111111 11111111 11111111 11111111 11111111|
+-------------------+-+----+-----------------------------------------------+
|                   | |    |
|                   | |    * 41-0 Object Offset (42-bits, 4TB address space)
|                   | |
|                   | * 45-42 Metadata Bits (4-bits)  0001 = Marked0
|                   |                                 0010 = Marked1
|                   |                                 0100 = Remapped
|                   |                                 1000 = Finalizable
|                   |
|                   * 46-46 Unused (1-bit, always zero)
|
* 63-47 Fixed (17-bits, always zero)

For more information on how to do that see


Side note: Using linked list for cases with tiny key values compared to the pointers is a huge memory waste, and it's also slower due to bad cache locality. In fact you shouldn't use linked list in most real life problems

这篇关于在 64 位指针中使用额外的 16 位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆