在 data.table 中使用 := 和 paste() [英] Using := in data.table with paste()
问题描述
我已经开始将 data.table
用于大型人口模型.到目前为止,我印象深刻,因为使用 data.table 结构将我的模拟运行时间减少了大约 30%.我正在尝试进一步优化我的代码并包含一个简化的示例.我的两个问题是:
I have started using data.table
for a large population model. So far, I have been impressed because using the data.table structure decreases my simulation run times by about 30%. I am trying to further optimize my code and have included a simplified example. My two questions are:
- 是否可以在此代码中使用
:=
运算符? - 使用
:=
运算符会更快(尽管,如果我能够回答我的第一个问题,我应该能够回答我的问题 2!)?
- Is is possible to use the
:=
operator with this code? - Would using the
:=
operator be quicker (although, if I am able to answer my first question, I should be able to answer my question 2!)?
我在运行 Windows 7 且 data.table
版本 1.9.4 的机器上使用 R 版本 3.1.2.
I am using R version 3.1.2 on a machine running Windows 7 with data.table
version 1.9.4.
这是我的可重现示例:
library(data.table)
## Create example table and set initial conditions
nYears = 10
exampleTable = data.table(Site = paste("Site", 1:3))
exampleTable[ , growthRate := c(1.1, 1.2, 1.3), ]
exampleTable[ , c(paste("popYears", 0:nYears, sep = "")) := 0, ]
exampleTable[ , "popYears0" := c(10, 12, 13)] # set the initial population size
for(yearIndex in 0:(nYears - 1)){
exampleTable[[paste("popYears", yearIndex + 1, sep = "")]] <-
exampleTable[[paste("popYears", yearIndex, sep = "")]] *
exampleTable[, growthRate]
}
我正在尝试做类似的事情:
I am trying to do something like:
for(yearIndex in 0:(nYears - 1)){
exampleTable[ , paste("popYears", yearIndex + 1, sep = "") :=
paste("popYears", yearIndex, sep = "") * growthRate, ]
}
但是,这不起作用,因为粘贴不适用于 data.table
,例如:
However, this does not work because the paste does not work with the data.table
, for example:
exampleTable[ , paste("popYears", yearIndex + 1, sep = "")]
# [1] "popYears10"
我浏览了 data.table 文档.FAQ 的第 2.9 节使用 cat
,但这会产生空输出.
I have looked through the data.table documentation. Section 2.9 of the FAQ uses cat
, but this produces a null output.
exampleTable[ , cat(paste("popYears", yearIndex + 1, sep = ""))]
# [1] popYears10NULL
另外,我尝试搜索 Google 和 rseek.org,但没有找到任何东西.如果缺少明显的搜索词,我将不胜感激搜索提示.我一直发现搜索 R 运算符很困难,因为搜索引擎不喜欢符号(例如,:=
")并且R"可能含糊不清.
Also, I tried searching Google and rseek.org, but didn't find anything. If am missing an obvious search term, I would appreciate a search tip. I have always found searching for R operators to be hard because search engines don't like symbols (e.g., ":=
") and "R" can be vague.
推荐答案
## Start with 1st three columns of example data
dt <- exampleTable[,1:3]
## Run for 1st five years
nYears <- 5
for(ii in seq_len(nYears)-1) {
y0 <- as.symbol(paste0("popYears", ii))
y1 <- paste0("popYears", ii+1)
dt[, (y1) := eval(y0)*growthRate]
}
## Check that it worked
dt
# Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1 1.1 10 11.0 12.10 13.310 14.6410 16.10510
#2: Site 2 1.2 12 14.4 17.28 20.736 24.8832 29.85984
#3: Site 3 1.3 13 16.9 21.97 28.561 37.1293 48.26809
由于使用 set()
加速此过程的可能性不断出现在评论中,我将把这个附加选项放在那里.
Because the possibility of speeding this up using set()
keeps coming up in the comments, I'll throw this additional option out there.
nYears <- 5
## Things that only need to be calculated once can be taken out of the loop
r <- dt[["growthRate"]]
yy <- paste0("popYears", seq_len(nYears+1)-1)
## A loop using set() and data.table's nice compact syntax
for(ii in seq_len(nYears)) {
set(dt, , yy[ii+1], r*dt[[yy[ii]]])
}
## Check results
dt
# Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1 1.1 10 11.0 12.10 13.310 14.6410 16.10510
#2: Site 2 1.2 12 14.4 17.28 20.736 24.8832 29.85984
#3: Site 3 1.3 13 16.9 21.97 28.561 37.1293 48.26809
这篇关于在 data.table 中使用 := 和 paste()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!