在64位指针中使用额外的16位 [英] Using the extra 16 bits in 64-bit pointers

查看:187
本文介绍了在64位指针中使用额外的16位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读到一台64位计算机实际上仅使用48位地址(具体来说,我使用的是Intel核心i7).

I read that a 64-bit machine actually uses only 48 bits of address (specifically, I'm using Intel core i7).

我希望多余的16位(位48-63)与该地址无关,因此将被忽略.但是当我尝试访问这样的地址时,我收到了信号EXC_BAD_ACCESS.

I would expect that the extra 16 bits (bits 48-63) are irrelevant for the address, and would be ignored. But when I try to access such an address I got a signal EXC_BAD_ACCESS.

我的代码是:

int *p1 = &val;
int *p2 = (int *)((long)p1 | 1ll<<48);//set bit 48, which should be irrelevant
int v = *p2; //Here I receive a signal EXC_BAD_ACCESS.

为什么会这样?有没有办法使用这16位?

Why this is so? Is there a way to use these 16 bits?

这可用于构建更易于缓存的链接列表.代替将8个字节用于下一个ptr,将8个字节用于键(由于对齐限制),可以将键嵌入到指针中.

This could be used to build more cache-friendly linked list. Instead of using 8 bytes for next ptr, and 8 bytes for key (due to alignment restriction), the key could be embedded into the pointer.

推荐答案

保留高阶位,以防将来地址总线增加,因此不能像这样简单地使用它

The high order bits are reserved in case the address bus would be increased in the future, so you can't use it simply like that

AMD64架构定义了64位虚拟地址格式,当前实现中使用低位48位(...)架构定义允许在将来的实现中将这一限制提高到全64位,将虚拟地址空间扩展到16 EB(2 64 个字节).相比之下,x86仅4 GB(2 32 字节).

The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations (...) The architecture definition allows this limit to be raised in future implementations to the full 64 bits, extending the virtual address space to 16 EB (264 bytes). This is compared to just 4 GB (232 bytes) for the x86.

http://en.wikipedia.org/wiki/X86-64 #Architectural_features

更重要的是,根据同一篇文章[Emphasis mine]:

More importantly, according to the same article [Emphasis mine]:

...在该体系结构的第一个实现中,实际上只有虚拟地址的最低有效48位将用于地址转换(页表查找)中.此外,任何虚拟地址的第48位到第63位必须是第47位的副本(类似于符号扩展的方式),否则处理器将引发异常.符合此规则的地址称为规范形式".

... in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup). Further, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form."

由于CPU将检查高位,即使它们没有被使用,它们也不是真正无关紧要的.在使用指针之前,您需要确保地址是规范的.其他一些64位体系结构(例如ARM64)可以选择忽略高位,因此您可以更轻松地将数据存储在指针中.

As the CPU will check the high bits even if they're unused, they're not really "irrelevant". You need to make sure that the address is canonical before using the pointer. Some other 64-bit architectures like ARM64 have the option to ignore the high bits, therefore you can store data in pointers much more easily.

也就是说,在x86_64中,如果需要,您仍然可以自由使用高16位,但是必须在取消引用之前通过符号扩展来检查并修复指针值.

That said, in x86_64 you're still free to use the high 16 bits if needed, but you have to check and fix the pointer value by sign-extending before dereferencing it.

请注意,将指针值强制转换为long错误的正确方法,因为不能保证long的宽度足以存储指针.您需要使用 uintptr_tintptr_t .

Note that casting the pointer value to long is not the correct way to do because long is not guaranteed to be wide enough to store pointers. You need to use uintptr_t or intptr_t.

int *p1 = &val; // original pointer
uint8_t data = ...;
const uintptr_t MASK = ~(1ULL << 48);

// store data into the pointer
//     note: to be on the safe side and future-proof (because future implementations could
//     increase the number of significant bits in the pointer), we should store values
//     from the most significant bits down to the lower ones
int *p2 = (int *)(((uintptr_t)p1 & MASK) | (data << 56));

// get the data stored in the pointer
data = (uintptr_t)p2 >> 56;

// deference the pointer
//     technically implementation defined. You may want a more
//     standard-compliant way to sign-extend the value
intptr_t p3 = ((intptr_t)p2 << 16) >> 16; // sign extend the pointer to make it canonical
val = *(int*)p3;

WebKit的JavaScriptCore和Mozilla的SpiderMonkey引擎在拳击技术.如果值为NaN,则低48位将存储指向对象的指针,高16位用作标记位,否则为双精度值.

WebKit's JavaScriptCore and Mozilla's SpiderMonkey engine use this in the nan-boxing technique. If the value is NaN, the low 48-bits will store the pointer to the object with the high 16 bits serve as tag bits, otherwise it's a double value.

您还可以使用低位来存储数据.它称为标记指针.如果int是4字节对齐的,则2个低位始终为0,您可以像在32位体系结构中一样使用它们.对于64位值,可以使用3个低位,因为它们已经是8字节对齐的.同样,在取消引用之前,您还需要清除这些位.

You can also use the lower bits to store data. It's called a tagged pointer. If int is 4-byte aligned then the 2 low bits are always 0 and you can use them like in 32-bit architectures. For 64-bit values you can use the 3 low bits because they're already 8-byte aligned. Again you also need to clear those bits before dereferencing.

int *p1 = &val; // the pointer we want to store the value into
int tag = 1;
const uintptr_t MASK = ~0x03ULL;

// store the tag
int *p2 = (int *)(((uintptr_t)p1 & MASK) | tag);

// get the tag
tag = (uintptr_t)p2 & 0x03;

// get the referenced data
intptr_t p3 = (uintptr_t)p2 & MASK; // clear the 2 tag bits before using the pointer
val = *(int*)p3;

一个著名的用户是V8的32位版本,其 SMI(小整数)优化(不过我不确定64位V8).最低的位将用作类型的标记:如果为0 ,则是一个小的31位整数,将带符号的右移1即可恢复该值; 如果为1 ,则该值是指向实际数据(对象,浮点数或更大的整数)的指针,只需清除标记并将其取消引用即可

One famous user of this is the 32-bit version of V8 with SMI (small integer) optimization (I'm not sure about 64-bit V8 though). The lowest bits will serve as a tag for type: if it's 0, it's a small 31-bit integer, do a signed right shift by 1 to restore the value; if it's 1, the value is a pointer to the real data (objects, floats or bigger integers), just clear the tag and dereference it

附带说明: 对于键值比指针小的情况,使用链表会浪费大量内存,并且由于缓存局部性不好,它也会变慢.实际上,在大多数现实生活中的问题中,您都不应该使用链表

Side note: Using linked list for cases with tiny key values compared to the pointers is a huge memory waste, and it's also slower due to bad cache locality. In fact you shouldn't use linked list in most real life problems

  • Bjarne Stroustrup says we must avoid linked lists
  • Why you should never, ever, EVER use linked-list in your code again
  • Number crunching: Why you should never, ever, EVER use linked-list in your code again
  • Bjarne Stroustrup: Why you should avoid Linked Lists
  • Are lists evil?—Bjarne Stroustrup

这篇关于在64位指针中使用额外的16位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆