R中整数向量的大小 [英] Sizes of integer vectors in R
问题描述
我曾经以为R有一个标准的开销来存储对象(似乎至少为整数向量为24个字节),但是一个简单的测试表明它比我想象的要复杂.例如,采用不超过100的整数向量(使用随机采样,希望避免任何偷偷摸摸的序列压缩),我发现不同长度的向量可能具有相同的大小,如下所示:
I had thought that R had a standard overhead for storing objects (24 bytes, it seems, at least for integer vectors), but a simple test revealed that it's more complex than I realized. For instance, taking integer vectors up to length 100 (using random sampling, hoping to avoid any sneaky sequence compression tricks that might be out there), I found that different length vectors could have the same size, as follows:
> N = 100
> V = vector(length = 100)
> for(L in 1:N){
+ z = sample(N, L, replace = TRUE)
+ V[L] = object.size(z)
+ }
>
> options('width'=88)
> V
[1] 48 48 56 56 72 72 72 72 88 88 88 88 104 104 104 104 168 168 168 168
[21] 168 168 168 168 168 168 168 168 168 168 168 168 176 176 184 184 192 192 200 200
[41] 208 208 216 216 224 224 232 232 240 240 248 248 256 256 264 264 272 272 280 280
[61] 288 288 296 296 304 304 312 312 320 320 328 328 336 336 344 344 352 352 360 360
[81] 368 368 376 376 384 384 392 392 400 400 408 408 416 416 424 424 432 432 440 440
我对显示的152
值印象深刻(观察:152 = 128 + 24,尽管280 = 256 + 24并不那么突出).有人可以解释这些分配是如何产生的吗?尽管出现了V单元,但我在文档中找不到清晰的定义.
I'm very impressed by the 152
values that shows up (observation: 152 = 128 + 24, though 280 = 256 + 24 isn't as prominent). Can someone explain how these allocations arise? I have been unable to find a clear definition in the documentation, though V cells come up.
推荐答案
即使您尝试N <-10000,所有值也会出现两次,除了长度为矢量的向量之外:
Even if you try N <- 10000, all values occur exactly twice, except for vectors of length :
- 5至8(56个字节)
- 9到12(72字节)
- 13至16(88字节)
- 17到32(152字节)
字节数出现两次的事实来自简单的事实,即以8个字节(在?gc
中称为Vcell)分配内存,而整数仅占用4个字节.
The fact that the number of bytes occurs twice, comes from the simple fact that the memory is allocated in pieces of 8 bytes (referred to as Vcells in ?gc
) and integers take only 4 bytes.
此外,R中对象的内部结构区分了用于分配内存的小向量和大向量.小向量分配在大约2Kb的较大块中,而大向量则单独分配. 小"向量根据长度由6个定义的类组成,并且能够存储最多8、16、32、48、64和128字节的向量数据.由于整数仅占用4个字节,因此您可以将2、4、8、12、16和32个整数存储在这6个类中.这说明了您所看到的模式.
Next to that, the internal structure of objects in R makes a distinguishment between small and large vectors for allocating memory. Small vectors are allocated in bigger blocks of about 2Kb, whereas larger vectors are allocated individually. The ‘small’ vectors consist of 6 defined classes, based on length, and are able to store vector data of up to 8, 16, 32, 48, 64 and 128 bytes. As an integer takes only 4 bytes, you have 2, 4, 8, 12, 16 and 32 integers you can store in these 6 classes. This explains the pattern you see.
额外的字节数用于报头(在?gc
中构成Ncell).如果您真的对所有这些内容都感兴趣,请阅读 R Internals 手册.
The extra number of bytes is for the header (which forms the Ncells in ?gc
). If you're really interested in all this, read through the R Internals manual.
而且,您猜到了,额外的24个字节来自标头(或Ncells).实际上,它要比这复杂得多,但是确切的详细信息可以在 R内部构件手册
And, as you guessed, the 24 extra bytes are from the headers (or Ncells ). It's in fact a bit more complicated than that, but the exact details can be found in the R internals manual
这篇关于R中整数向量的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!