R中整数向量的大小 [英] Sizes of integer vectors in R

查看:132
本文介绍了R中整数向量的大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我曾经以为R有一个标准的开销来存储对象(似乎至少为整数向量为24个字节),但是一个简单的测试表明它比我想象的要复杂.例如,采用不超过100的整数向量(使用随机采样,希望避免任何偷偷摸摸的序列压缩),我发现不同长度的向量可能具有相同的大小,如下所示:

I had thought that R had a standard overhead for storing objects (24 bytes, it seems, at least for integer vectors), but a simple test revealed that it's more complex than I realized. For instance, taking integer vectors up to length 100 (using random sampling, hoping to avoid any sneaky sequence compression tricks that might be out there), I found that different length vectors could have the same size, as follows:

> N   = 100
> V   = vector(length = 100)
> for(L in 1:N){
+     z = sample(N, L, replace = TRUE)
+     V[L]    = object.size(z)
+ }
> 
> options('width'=88)
> V
  [1]  48  48  56  56  72  72  72  72  88  88  88  88 104 104 104 104 168 168 168 168
 [21] 168 168 168 168 168 168 168 168 168 168 168 168 176 176 184 184 192 192 200 200
 [41] 208 208 216 216 224 224 232 232 240 240 248 248 256 256 264 264 272 272 280 280
 [61] 288 288 296 296 304 304 312 312 320 320 328 328 336 336 344 344 352 352 360 360
 [81] 368 368 376 376 384 384 392 392 400 400 408 408 416 416 424 424 432 432 440 440

我对显示的152值印象深刻(观察:152 = 128 + 24,尽管280 = 256 + 24并不那么突出).有人可以解释这些分配是如何产生的吗?尽管出现了V单元,但我在文档中找不到清晰的定义.

I'm very impressed by the 152 values that shows up (observation: 152 = 128 + 24, though 280 = 256 + 24 isn't as prominent). Can someone explain how these allocations arise? I have been unable to find a clear definition in the documentation, though V cells come up.

推荐答案

即使您尝试N <-10000,所有值也会出现两次,除了长度为矢量的向量之外:

Even if you try N <- 10000, all values occur exactly twice, except for vectors of length :

  • 5至8(56个字节)
  • 9到12(72字节)
  • 13至16(88字节)
  • 17到32(152字节)

字节数出现两次的事实来自简单的事实,即以8个字节(在?gc中称为Vcell)分配内存,而整数仅占用4个字节.

The fact that the number of bytes occurs twice, comes from the simple fact that the memory is allocated in pieces of 8 bytes (referred to as Vcells in ?gc ) and integers take only 4 bytes.

此外,R中对象的内部结构区分了用于分配内存的小向量和大向量.小向量分配在大约2Kb的较大块中,而大向量则单独分配. 小"向量根据长度由6个定义的类组成,并且能够存储最多8、16、32、48、64和128字节的向量数据.由于整数仅占用4个字节,因此您可以将2、4、8、12、16和32个整数存储在这6个类中.这说明了您所看到的模式.

Next to that, the internal structure of objects in R makes a distinguishment between small and large vectors for allocating memory. Small vectors are allocated in bigger blocks of about 2Kb, whereas larger vectors are allocated individually. The ‘small’ vectors consist of 6 defined classes, based on length, and are able to store vector data of up to 8, 16, 32, 48, 64 and 128 bytes. As an integer takes only 4 bytes, you have 2, 4, 8, 12, 16 and 32 integers you can store in these 6 classes. This explains the pattern you see.

额外的字节数用于报头(在?gc中构成Ncell).如果您真的对所有这些内容都感兴趣,请阅读 R Internals 手册.

The extra number of bytes is for the header (which forms the Ncells in ?gc). If you're really interested in all this, read through the R Internals manual.

而且,您猜到了,额外的24个字节来自标头(或Ncells).实际上,它要比这复杂得多,但是确切的详细信息可以在 R内部构件手册

And, as you guessed, the 24 extra bytes are from the headers (or Ncells ). It's in fact a bit more complicated than that, but the exact details can be found in the R internals manual

这篇关于R中整数向量的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆