复杂的数据表子集和矢量化处理 [英] complex data.table subset and vectorised maniulation
问题描述
好吧,我有一个使用data.frames构建的复杂函数,为了加快速度,我转向了data.table.我对此很陌生,所以我很困惑.无论如何,我已经做了一个简单得多的玩具示例,说明了我想做什么,但是我无法弄清楚如何将其转换为data.table格式.这是data.frame形式的示例:
Ok I have a complex function built using data.frames and in trying to speed it up I've turned to data.table. I'm totally new to this so I'm quite befuddled. Anyhow I've made a much much simpler toy example of what I want to do, but I cannot work out how to translate it into data.table format. Here is the example in data.frame form:
rows <- 10
data1 <- data.frame( id =1:rows,
a = seq(0.2, 0.55, length.out = rows),
b = seq(0.35, 0.7, length.out = rows),
c = seq(0.4, 0.83, length.out = rows),
d = seq(0.6, 0.87, length.out = rows),
e = seq(0.7, 0.99, length.out = rows),
f = seq(0.52, 0.90, length.out = rows)
)
DT1 <- data.table(data1) #for later
data2 <- data.frame( id =3:1,
a = rep(3, 3),
d = rep(2, 3),
f = rep(1, 3)
)
m.names <- c("a", "d", "f")
data1[match(data2$id, data1$id),m.names] <- data1[match(data2$id, data1$id),m.names] + data2[match(data2$id, data1$id),m.names]
因此请注意,在最后一步中,我想在现有图形和新数据之间进行加法,并将其矢量化为几列.
So note in the last step that I want perform addition between the pre-existing figures and the new data and its vectorised across several columns.
在data.table格式中,我到此为止:
In a data.table format I've only gotten this far:
DT1[id %in% data2$id, m.names, with=FALSE]
这将选择我要添加的值,但此后我迷路了.我将不胜感激!
This selects the values I want to add to but after that I am lost. I would appreciate any help !
好吧,我已经弄清楚了一部分-我可以使用以上代码的最后一行,使用data2来存储矢量化加法部分,如下所示:
Ok I've figure out part of it - I can use the last line of code above to achieve the vectorised addition part using using data2 to store the added values as follows:
data2[,m.names] <- data2[,m.names] + data.frame(DT1[id %in% data2$id, m.names, with=FALSE])
即使有250万行(在DT1中)和data2中的10,000行以及6个匹配的列,也只需要0.004sec,但是我仍然需要将新的data2分配给数据1中适当的动态分配的列
Even with 2.5million rows (in DT1) and 10,000 rows in data2 and 6 matching columns this only takes 0.004sec, but I still need to assign the new data2 to the appropriate dynamically assigned columns in data 1
推荐答案
感谢@David Arenburg的建议.我对其进行了稍微修改,以得出以下首选解决方案
Ok thanks to @David Arenburg for his suggestion. I've modified it slightly to arrive at the following for my preferred solution
text <- NULL
for(i in 1:length(m.names)){
text <- paste0(text, m.names[i], " = ", m.names[i], " + i.", m.names[i], ", ")
}
expr <- parse(text = paste0("\":=\"(", substr(text, 1, nchar(text)-2), ")" ))
res2 <- DT1[data2, eval(expr)]
这篇关于复杂的数据表子集和矢量化处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!