为什么OCaml中的int只有31位? [英] Why is an int in OCaml only 31 bits?

查看:172
本文介绍了为什么OCaml中的int只有31位?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在其他任何地方都没有看过这个功能。我知道第32位用于垃圾收集。但是为什么它只适用于整数而不是其他基本类型?

Haven't seen this "feature" anywhere else. I know that the 32nd bit is used for garbage collection. But why is it that way only for ints and not for the other basic types?

推荐答案

这称为标记指针表示,并且是几十年来在许多不同的解释器,VM和运行时系统中使用的非常常见的优化技巧。几乎每个Lisp实现都使用它们,许多Smalltalk VM,许多Ruby解释器等等。

This is called a tagged pointer representation, and is a pretty common optimization trick used in many different interpreters, VMs and runtime systems for decades. Pretty much every Lisp implementation uses them, many Smalltalk VMs, many Ruby interpreters, and so on.

通常,在这些语言中,你总是传递指向对象的指针。对象本身由一个对象头组成,它包含对象元数据(如对象的类型,类,可能是访问控制限制或安全注释等),然后是实际的对象数据本身。因此,一个简单的整数将表示为指针加上一个由元数据和实际整数组成的对象。即使有一个非常紧凑的表示,对于一个简单的整数,它就像6字节。

Usually, in those languages, you always pass around pointers to objects. An object itself consists of an object header, which contains object metadata (like the type of an object, its class(es), maybe access control restrictions or security annotations and so on), and then the actual object data itself. So, a simple integer would be represented as a pointer plus an object consisting of metadata and the actual integer. Even with a very compact representation, that's something like 6 Byte for a simple integer.

另外,你不能将这样的整数对象传递给CPU来执行快速整数运算。如果要添加两个整数,真的只有两个指针,指向要添加的两个整数对象的对象标题的开头。因此,首先需要对第一个指针执行整数运算,以将偏移量添加到存储整数数据的对象中。然后你必须取消引用该地址。使用第二个整数再次执行相同操作。现在你有两个整数,你可以实际要求CPU添加。当然,你现在需要构造一个新的整数对象来保存结果。

Also, you cannot pass such an integer object to the CPU to perform fast integer arithmetic. If you want to add two integers, you really only have two pointers, which point to the beginning of the object headers of the two integer objects you want to add. So, you first need to perform integer arithmetic on the first pointer to add the offset into the object to it where the integer data is stored. Then you have to dereference that address. Do the same again with the second integer. Now you have two integers you can actually ask the CPU to add. Of course, you need to now construct a new integer object to hold the result.

所以,为了执行一个整数加法,你实际上需要执行三个整数加法加上两个指针dererefences加上一个对象构造。并且你占用了近20个字节。

So, in order to perform one integer addition, you actually need to perform three integer additions plus two pointer dererefences plus one object construction. And you take up almost 20 Byte.

然而,诀窍是所谓的不可变值类型就像整数一样,你通常不会t 需要对象标题中的所有元数据:你可以把所有的东西都留下来,然后简单地合成它(这是VM-nerd-代表假它),当有人关心的时候。一个整数总是具有类 Integer ,不需要单独存储该信息。如果有人使用反射来计算整数的类,你只需回复 Integer ,并且没有人会知道你实际上并没有将这些信息存储在对象头中事实上,甚至不是对象标题(或对象)。

However, the trick is that with so-called immutable value types like integers, you usually don't need all the metadata in the object header: you can just leave all that stuff out, and simply synthesize it (which is VM-nerd-speak for "fake it"), when anyone cares to look. An integer will always have class Integer, there's no need to separately store that information. If someone uses reflection to figure out the class of an integer, you simply reply Integer and nobody will ever know that you didn't actually store that information in the object header and that in fact, there isn't even an object header (or an object).

所以,诀窍是存储值指针中的对象对象,有效地将两者合并为一个。

So, the trick is to store the value of the object within the pointer to the object, effectively collapsing the two into one.

有些CPU实际上有指针内的额外空间(所谓的标记位),允许您在指针本身内存储有关指针的额外信息。额外的信息,如这实际上不是一个指针,这是一个整数。例子包括Burroughs B5000,各种Lisp机器或AS / 400。不幸的是,目前主流的大多数CPU都没有这个功能。

There are CPUs which actually have additional space within a pointer (so-called tag bits) that allow you to store extra information about the pointer within the pointer itself. Extra information like "this isn't actually a pointer, this is an integer". Examples include the Burroughs B5000, the various Lisp Machines or the AS/400. Unfortunately, most of the current mainstream CPUs don't have that feature.

然而,还有一条出路:当地址不是很大时,目前大多数主流CPU工作速度都要慢得多在字边界上对齐。有些甚至根本不支持未对齐访问。

However, there is a way out: most current mainstream CPUs work significantly slower when addresses aren't aligned on word boundaries. Some even don't support unaligned access at all.

这意味着在实践中,所有指针将被4整除,意味着他们总是以两个 0 位结束。这允许我们区分真实指针(以 00 结尾)和伪装实际为整数的指针(以<$ c结尾的指针) $ C> 1 )。并且它仍然留给我们所有指针,以 10 结束,可以自由地做其他事情。此外,大多数现代操作系统为自己保留了非常低的地址,这给了我们另一个混乱的区域(以24 0 开头的指针结束 00 )。

What this means is that in practice, all pointers will be divisible by 4, which means they will always end with two 0 bits. This allows us to distinguish between real pointers (that end in 00) and pointers which are actually integers in disguise (those that end with 1). And it still leaves us with all pointers that end in 10 free to do other stuff. Also, most modern operating systems reserve the very low addresses for themselves, which gives us another area to mess around with (pointers that start with, say, 24 0s and end with 00).

因此,您可以将31位整数编码为指针,只需将其移位1位于左侧并向其添加 1 。并且您可以通过简单地移动它们来执行非常快速的整数运算(有时甚至不需要)。

So, you can encode a 31-bit integer into a pointer, by simply shifting it 1 bit to the left and adding 1 to it. And you can perform very fast integer arithmetic with those, by simply shifting them appropriately (sometimes not even that is necessary).

我们做什么那些其他地址空间呢?好吧,典型的例子包括在另一个大地址空间中编码 float s以及一些特殊对象,如 true false nil ,127个ASCII字符,一些常用的短字符串,空列表,空对象,空在 0 地址附近的数组等。

What do we do with those other address spaces? Well, typical examples include encoding floats in the other large address space and a number of special objects like true, false, nil, the 127 ASCII characters, some commonly used short strings, the empty list, the empty object, the empty array and so on near the 0 address.

例如,在MRI,YARV和Rubinius Ruby解释器中,整数按照我上面描述的方式进行编码, false 编码为地址 0 (恰好也是如此 在C)中表示 false true 作为地址 2 (恰好是 true 的C表示移位一位)和 nil as 4

For example, in the MRI, YARV and Rubinius Ruby interpreters, integers are encoded the way I described above, false is encoded as address 0 (which just so happens also to be the representation of false in C), true as address 2 (which just so happens to be the C representation of true shifted by one bit) and nil as 4.

这篇关于为什么OCaml中的int只有31位?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆