扩展数据表时出现奇怪错误 [英] Strange error when expanding data.table
问题描述
我们偶然发现了一些尝试扩展data.table的奇怪行为。以下代码可以正常工作:
We stumbled upon some strange behaviour trying to expand a data.table. The following code works alright:
dt <- data.table(var1=1:2e3, var2=1:2e3, freq=1:2e3)
system.time(dt.expanded <- dt[ ,list(freq=rep(1,freq)),by=c("var1","var2")])
## user system elapsed
## 0.05 0.01 0.06
但使用以下 data.table
set.seed(1)
dt <- data.table(var1=sample(letters,1000,replace=T),var2=sample(LETTERS,1000,replace=T),freq=sample(1:10,1000,replace=T))
使用相同的代码给予
Error in rep(1, freq) : invalid 'times' argument
这可能是 data.table
中的错误吗?
的示例来自 R Machine Learning Essentials < a>)
(I got the syntax of the this example from R Machine Learning Essentials)
编辑
所以问题真的很像是 code>,而不是
data.table
。 rep
的帮助页面为参数 times
说明:
Edit
So the problem really seems to be with rep
and not with data.table
. The help page for rep
says for the parameter times
:
一个整数向量,给出长度为(x)时重复每个元素的次数(非负),或者重复整个向量(如果长度为1)。
A integer vector giving the (non-negative) number of times to repeat each element if of length length(x), or to repeat the whole vector if of length 1.
第二个 data.table
创建次
推荐答案
我的guess:当 rep(x,times)
给出 times
的向量时,它坚持 x
是相同的长度(而不是做自然的事情在R和回收)。所以手动循环的工作原理:
My guess: when rep(x,times)
is given a vector for times
, it insists that x
be the same length (instead of doing the natural thing in R and recycling). So manual recycling works:
dt[ ,.(rep(rep(1,.N),freq)), by=.(var1,var2)]
似乎是基础R的一个问题),而不是在 data.table
中。在第一个例子中OP没有碰到这个问题,因为 by =。(var1,var2)
确保每个组只返回一行,所以 times
参数是一个标量。
Seems to be a problem in base R (or maybe it's deliberate?), not in data.table
. The OP didn't hit this problem in the first example because by=.(var1,var2)
ensured that only one row was returned for each group, so the times
argument was a scalar.
这篇关于扩展数据表时出现奇怪错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!