为什么 data.frame 第一个元素的访问时间取决于它的维度? [英] Why does the access time for the first element of a data.frame depends on its dimensions?

查看:26
本文介绍了为什么 data.frame 第一个元素的访问时间取决于它的维度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在访问 data.frame 的第一个元素时遇到困难.访问时间似乎取决于data.frame 的大小.有谁知道如何消除这种依赖性?

这是我运行的代码示例.它分配tme",以节省设置 length i*1000data.frame 的第一个元素所需的时间,其中 i 从 1 到 500 运行.本质上,我以 1000 为步长分配越来越长的 data.frames 并将第一个元素设置为零.简而言之,data.frames 访问时间远低于可测量性,它们在长数组中上升到几秒钟.

tme <- (1:500)for (j in 1:500){i <- j*1000vec <- (1:(i*1000))打印(一)现在 <- Sys.time()vec[1] <- 0tme[j] <- Sys.time()-now}tme_vec_first <- tme

解决方案

我不认为时间的增加与访问时间有关,而是由于制作副本.这些分配中的每一个都涉及制作向量的副本.你可以用 tracemem 测试这个.

# 初始化向量(10 个零)tracemem({vec <- integer(10)})

<块引用>

[1] "<0000000011D48720>"

# 给第 7 个位置赋值tracemem({vec[7] <- 6L})

<块引用>

tracemem[0x0000000011d48720 -> 0x00000000111a02b0]:
[1] "<0000000012E25468>"

随着向量变大,复制过程所涉及的时间也会增加.

<小时>

另外注意vec <- (1:(i*1000))是一个整数向量,vec[1] <- 0转vec成一个双向量,这大约使内存中向量的大小增加一倍.

首先,我们将创建整数向量,并检查它的大小和类型.

# 使用与问题类似的语法重新开始tracemem({vec <- 1:10})

<块引用>

[1] "<0000000011E55508>"

#检查大小object.size(vec)

<块引用>

88 字节

# 检查类型类型(vec)

<块引用>

[1] "整数"

现在,将 0 分配给第 7 个位置并重新检查大小和类型.0 似乎与最初存在的值相同,但实际上是双精度值而不是整数.

#赋值tracemem({vec[7] <- 0})

<块引用>

tracemem[0x0000000011e55508 -> 0x0000000012399390]:
tracemem[0x0000000012399390 -> 0x0000000013394740]:
[1] "<00000000130EBA60>"

# 检查大小object.size(vec)

<块引用>

168 字节

# 检查类型类型(vec)

<块引用>

[1]双重"

请注意,这里有两个单独的复制说明.我的猜测是第一个是将向量从整数转换为双精度的副本,第二个是赋值.

要将向量保持为整数向量,请使用 vec[1] <- 0L 代替,因为L"告诉 R 需要一个整数.

<小时>

请注意,在使用 Windows 7 的 MS R open 3.2.5 时,Rstudio 和 Rgui 都观察到了这种复制行为 tracemem.

I have difficulties with the access time to the first element of a data.frame. The access times seem to depend on the size of the data.frame. Does anyone know how to eliminate this dependency?

This is a code example I have run. It allocates "tme" that saves the times required to set the first element of a data.frame of length i*1000, where i runs from 1 to 500. Essentially, I allocate longer and longer data.frames in steps of 1000 and set the first element to zero. In short data.frames the access times are far below measurability, they rise to several seconds in long arrays.

tme <- (1:500)
for (j in 1:500){
  i <- j*1000
  vec <- (1:(i*1000))
  print(i)
  now <- Sys.time()
  vec[1] <- 0
  tme[j] <- Sys.time()-now
}
tme_vec_first <- tme

解决方案

I don't think the increase in time is related to access time, but is rather due to making copies. Each of these assignments involves making a copy of the vector. You can test this with tracemem.

# initialize vector (10 zeros)
tracemem({vec <- integer(10)})

[1] "<0000000011D48720>"

# assign value to 7th position
tracemem({vec[7] <- 6L})

tracemem[0x0000000011d48720 -> 0x00000000111a02b0]:
[1] "<0000000012E25468>"

As the vector grows larger, the time involved in the copy process increases.


Further, note that vec <- (1:(i*1000)) is an integer vector, and vec[1] <- 0 turns vec into a double vector, which roughly doubles the size of the vector in memory.

First, we'll create the integer vector, and check it's size and type.

# start over with similar syntax to question
tracemem({vec <- 1:10})

[1] "<0000000011E55508>"

# check size object.size(vec)

88 bytes

# check type
typeof(vec)

[1] "integer"

Now, assign assign 0 to the 7th position and re-check size and type. 0 appears to be the same value as the value that is there initially, but is actually a double rather than an integer.

# assign value
tracemem({vec[7] <- 0})

tracemem[0x0000000011e55508 -> 0x0000000012399390]:
tracemem[0x0000000012399390 -> 0x0000000013394740]:
[1] "<00000000130EBA60>"

# check size
object.size(vec)

168 bytes

# check type
typeof(vec)

[1] "double"

Notice here, that there are two separate copy instructions. My guess is that the first is the copy to convert the vector from integer into double and the second is the assignment.

To keep the vector as an integer vector, use vec[1] <- 0L instead as "L" tells R that an integer is desired.


Note that this copying behavior tracemem is observed both with Rstudio and Rgui in using MS R open 3.2.5 with windows 7.

这篇关于为什么 data.frame 第一个元素的访问时间取决于它的维度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆