如何在data.table中使用未知数量的键列 [英] How to use an unknown number of key columns in a data.table
问题描述
我想做同样的解释这里,即向data.table添加缺少的行。我面临的唯一额外的困难是我想要的键列数,即用于自联接的行是灵活的。
在这里是一个小例子,基本上重复上面提到的链接中做了什么:
df< - data.frame rep(字母[1:4],每个= 6),
cfType = rep(c(D,D,T,T,R,R = 4),
variable = rep(c(1,3),times = 12),
value = 1:24)
DT < - as.data.table(df)
idCols< - c(fundID,cfType)
setkeyv(DT,c(idCols,variable))
DT [CJ unique(df $ cfType),seq(from = min(variable),to = max(variable))),nomatch = NA]
我最烦恼的是最后一行。我想要 idCols
是灵活的(例如,如果我在一个函数中使用它),所以我不想键入 unique(df $ fundID ),unique(df $ cfType)
。但是,我只是没有找到任何解决方法。根据 CJ
的需要,我尝试将 df
的子集自动拆分为向量失败, em> setkeyv(x,cols,verbose = verbose)中的错误:列'V1'是类型'list',它不是当前允许的关键列类型。
CJ(sapply(df [,idCols],unique))
CJ(unique(df [,idCols]))
CJ .vector(unique(df [,idCols])))
CJ(unique(DT [,idCols,with = FALSE]))
我也尝试自己构建表达式:
str< -
for(i in idCols){
str< - paste0(str,unique(df $,i,),)
}
str< - paste0 (str,seq(from = min(variable),to = max(variable)))
str
[1]unique(df $ fundID) (from = min(variable),to = max(variable))
t知道如何使用 str
。这一切都失败了:
CJ(eval(str))
CJ(substitute(str))
有没有人知道一个很好的解决方法? $ b $($($))
b 解决方案我从来没有使用过data.table包,所以原谅我,如果我错过了这里的标记,但我认为我有它。这里有很多事情。首先阅读 do.call
,它允许你以一种非传统的方式来评估任何函数,其中参数由提供的列表指定(其中每个元素是在列表中的位置与函数参数匹配,除非显式命名)。还要注意,我不得不指定 min(df $ variable)
而不是 min(variable)
。请阅读 Hadley的范围界面,以了解此处的问题。
CJargs < - lapply(df [,idCols],unique)
names(CJargs)< - NULL
CJargs [[length(CJargs)+1]] DT [do.call(CJ,CJargs) nomatch = NA]
I want to do the same as explained here, i.e. adding missing rows to a data.table. The only additional difficulty I'm facing is that I want the number of key columns, i.e. those rows that are used for the self-join, to be flexible.
Here is a small example that basically repeats what is done in the link mentioned above:
df <- data.frame(fundID = rep(letters[1:4], each=6),
cfType = rep(c("D", "D", "T", "T", "R", "R"), times=4),
variable = rep(c(1,3), times=12),
value = 1:24)
DT <- as.data.table(df)
idCols <- c("fundID", "cfType")
setkeyv(DT, c(idCols, "variable"))
DT[CJ(unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))), nomatch=NA]
What bothers me is the last line. I want idCols
to be flexible (for instance if I use it within a function), so I don't want to type unique(df$fundID), unique(df$cfType)
manually. However, I just don't find any workaround for this. All my attempts to automatically split the subset of df
into vectors, as needed by CJ
, fail with the error message Error in setkeyv(x, cols, verbose = verbose) : Column 'V1' is type 'list' which is not (currently) allowed as a key column type.
CJ(sapply(df[, idCols], unique))
CJ(unique(df[, idCols]))
CJ(as.vector(unique(df[, idCols])))
CJ(unique(DT[, idCols, with=FALSE]))
I also tried building the expression myself:
str <- ""
for (i in idCols) {
str <- paste0(str, "unique(df$", i, "), ")
}
str <- paste0(str, "seq(from=min(variable), to=max(variable))")
str
[1] "unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))"
But then I don't know how to use str
. This all fails:
CJ(eval(str))
CJ(substitute(str))
CJ(call(str))
Does anyone know a good workaround?
解决方案 I've never used the data.table package, so forgive me if I miss the mark here, but I think I've got it. There's a lot going on here. Start by reading up on do.call
, which allows you to evaluate any function in a sort of non-traditional manner where arguments are specified by a supplied list (where each element is in the list is positionally matched to the function arguments unless explicitly named). Also notice that I had to specify min(df$variable)
instead of just min(variable)
. Read Hadley's page on scoping to get an idea of the issue here.
CJargs <- lapply(df[, idCols], unique)
names(CJargs) <- NULL
CJargs[[length(CJargs) +1]] <- seq(from=min(df$variable), to=max(df$variable))
DT[do.call("CJ", CJargs),nomatch=NA]
这篇关于如何在data.table中使用未知数量的键列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!