使用:=在data.table中粘贴() [英] Using := in data.table with paste()

查看:89
本文介绍了使用:=在data.table中粘贴()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于大型人口模型,我开始使用 data.table 。到目前为止,我已经印象深刻,因为使用data.table结构减少了我的模拟运行时间约30%。我试图进一步优化我的代码,并包括一个简化的例子。我的两个问题是:1)是否可以使用:= 运算符与此代码?和2)将使用:= 操作符更快(虽然,如果我能回答我的第一个问题,我应该能够回答我的问题2) / p>

我在使用 data.table 1.9.4版的运行Windows 7的机器上使用R版本3.1.2。



这是我可重现的示例:

 

##创建示例表并设置初始条件
nYears = 10
exampleTable = data.table(Site = paste(Site,1:3))
exampleTable [,growthRate:= c(1.1,1.2,1.3),]
exampleTable [,c(paste(popYears,0:nYears,sep =)):= 0,
$ b exampleTable [,popYears0:= c(10,12,13)]#设置初始填充大小

(yearIndex in 0:(nYears - 1)
exampleTable [[粘贴(popYears,yearIndex,sep =)]] <
exampleTable [[粘贴(popYears,yearIndex + 1,
exampleTable [,growthRate]
}

  for(yearIndex in 0:(nYears  -  1)){
exampleTable [,paste(popYears ,yearIndex + 1,sep =):=
paste(popYears,yearIndex,sep =)* growthRate,]
}
/ pre>

但是,这不工作,因为粘贴不能与 data.table

  exampleTable [,粘贴(popYears,yearIndex + 1,sep =)] 
# 1]popYears10

我已浏览过 data.table documentation 。 FAQ的第2.9节使用 cat ,但这会产生一个null输出。

  exampleTable [,cat(paste(popYears,yearIndex + 1,sep =))] 
#[1] popYears10NULL

此外,我尝试搜索Google和rseek.org,但没有找到任何东西。如果缺少一个明显的搜索字词,我会喜欢搜索提示。我总是发现搜索R运算符是困难的,因为搜索引擎不喜欢符号(例如,:= )和R可能是模糊的。



最后,这是我在stackoverflow的第一篇文章,所以我道歉,如果我违反了过帐标准。

解决方案

  ##从第一列三列示例数据开始
dt< ; - exampleTable [,1:3,with = FALSE]

##运行1年5年
nYears < - 5
(ii in seq_len(nYears) 1){
y0< - as.symbol(paste0(popYears,ii))
y1< - paste0(popYears,ii + 1)
dt [, y1):= eval(y0)* growthRate]
}

##检查它是否工作
dt
#网站growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1:Site 1 1.1 10 11.0 12.10 13.310 14.6410 16.10510
#2:Site 2 1.2 12 14.4 17.28 20.736 24.8832 29.85984
#3:Site 3 1.3 13 16.9 21.97 28.561 37.1293 48.26809



编辑



使用 set()加速这个可能性在评论中不断出现,我会把这个额外的选项放在那里。

  nYears < -  5 

##只需要计算一次的事情可以从循环中取出
r< - dt [[growthRate]]
yy< - paste0(popYears,seq_len(nYears + 1)-1)

##使用set .table的不错的紧凑语法
for(ii in seq_len(nYears)){
set(dt,,yy [ii + 1],r * dt [[yy [ii]]])
}

##检查结果
dt
#站点growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1:Site 1 1.1 10 11.0 12.10 13.310 14.6410 16.10510
#2:Site 2 1.2 12 14.4 17.28 20.736 24.8832 29.85984
#3:Site 3 1.3 13 16.9 21.97 28.561 37.1293 48.26809


I have started using data.table for a large population model. So far, I have been impressed because using the data.table structure decreases my simulation run times by about 30%. I am trying to further optimize my code and have included a simplified example. My two questions are: 1) Is is possible to use the := operator with this code? and 2) Would using the := operator be quicker (although, if I am able to answer my first question, I should be able to answer my question 2!)?

I am using R version 3.1.2 on a machine running Windows 7 with data.table version 1.9.4.

Here is my reproducible example:

library(data.table)

## Create  example table and set initial conditions
nYears = 10
exampleTable = data.table(Site = paste("Site", 1:3))
exampleTable[ , growthRate := c(1.1, 1.2, 1.3), ]
exampleTable[ , c(paste("popYears", 0:nYears, sep = "")) := 0, ]

exampleTable[ , "popYears0" := c(10, 12, 13)] # set the initial population size

for(yearIndex in 0:(nYears - 1)){
    exampleTable[[paste("popYears", yearIndex + 1, sep = "")]] <- 
    exampleTable[[paste("popYears", yearIndex, sep = "")]] * 
    exampleTable[, growthRate]
}

I am trying to do something like:

for(yearIndex in 0:(nYears - 1)){
    exampleTable[ , paste("popYears", yearIndex + 1, sep = "") := 
    paste("popYears", yearIndex, sep = "") * growthRate, ] 
}

However, this does not work because the paste does not work with the data.table, for example:

exampleTable[ , paste("popYears", yearIndex + 1, sep = "")]
# [1] "popYears10"

I have looked through the data.table documentation. Section 2.9 of the FAQ uses cat, but this produces a null output.

exampleTable[ , cat(paste("popYears", yearIndex + 1, sep = ""))]
# [1] popYears10NULL

Also, I tried searching Google and rseek.org, but didn't find anything. If am missing an obvious search term, I would appreciate a search tip. I have always found searching for R operators to be hard because search engines don't like symbols (e.g., ":=") and "R" can be vague.

Last, this is my first post on stackoverflow so I apologize if I violated a posting standard.

解决方案

## Start with 1st three columns of example data
dt <- exampleTable[,1:3,with=FALSE]

## Run for 1st five years
nYears <- 5
for(ii in seq_len(nYears)-1) {
    y0 <- as.symbol(paste0("popYears", ii))
    y1 <- paste0("popYears", ii+1)
    dt[, (y1) := eval(y0)*growthRate]
}

## Check that it worked
dt
#     Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1        1.1        10      11.0     12.10    13.310   14.6410  16.10510
#2: Site 2        1.2        12      14.4     17.28    20.736   24.8832  29.85984
#3: Site 3        1.3        13      16.9     21.97    28.561   37.1293  48.26809

Edit:

Because the possibility of speeding this up using set() keeps coming up in the comments, I'll throw this additional option out there.

nYears <- 5

## Things that only need to be calculated once can be taken out of the loop
r <- dt[["growthRate"]]
yy <- paste0("popYears", seq_len(nYears+1)-1)

## A loop using set() and data.table's nice compact syntax
for(ii in seq_len(nYears)) {
    set(dt, , yy[ii+1], r*dt[[yy[ii]]])
}

## Check results
dt
#     Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1        1.1        10      11.0     12.10    13.310   14.6410  16.10510
#2: Site 2        1.2        12      14.4     17.28    20.736   24.8832  29.85984
#3: Site 3        1.3        13      16.9     21.97    28.561   37.1293  48.26809

这篇关于使用:=在data.table中粘贴()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆