如何在data.table中使用未知数量的键列 [英] How to use an unknown number of key columns in a data.table

查看：110 发布时间：2017/3/12 11:02:04 r data.table

本文介绍了如何在data.table中使用未知数量的键列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想做同样的解释这里，即向data.table添加缺少的行。我面临的唯一额外的困难是我想要的键列数，即用于自联接的行是灵活的。

在这里是一个小例子，基本上重复上面提到的链接中做了什么：

  df<  -  data.frame rep（字母[1：4]，每个= 6），
 cfType = rep（c（D，D，T，T，R，R = 4），
 variable = rep（c（1,3），times = 12），
 value = 1:24）
 DT < -  as.data.table（df） 
 idCols<  -  c（fundID，cfType）
 setkeyv（DT，c（idCols，variable））
 DT [CJ unique（df $ cfType），seq（from = min（variable），to = max（variable））），nomatch = NA]

我最烦恼的是最后一行。我想要 idCols 是灵活的（例如，如果我在一个函数中使用它），所以我不想键入 unique（df $ fundID ），unique（df $ cfType）。但是，我只是没有找到任何解决方法。根据 CJ 的需要，我尝试将 df 的子集自动拆分为向量失败， em> setkeyv（x，cols，verbose = verbose）中的错误：列'V1'是类型'list'，它不是当前允许的关键列类型。

  CJ（sapply（df [，idCols]，unique））
 CJ（unique（df [，idCols]））
 CJ .vector（unique（df [，idCols]）））
 CJ（unique（DT [，idCols，with = FALSE]））

我也尝试自己构建表达式：

  str<  - 
 for（i in idCols）{
 str<  -  paste0（str，unique（df $，i，），）
} 
 str<  -  paste0 （str，seq（from = min（variable），to = max（variable）））
 str 
 [1]unique（df $ fundID） （from = min（variable），to = max（variable））

t知道如何使用 str 。这一切都失败了：

  CJ（eval（str））
 CJ（substitute（str））
有没有人知道一个很好的解决方法？
 $ b $（$（$））
 
 
 b 
解决方案
我从来没有使用过data.table包，所以原谅我，如果我错过了这里的标记，但我认为我有它。这里有很多事情。首先阅读 do.call ，它允许你以一种非传统的方式来评估任何函数，其中参数由提供的列表指定（其中每个元素是在列表中的位置与函数参数匹配，除非显式命名）。还要注意，我不得不指定 min（df $ variable）而不是 min（variable）。请阅读 Hadley的范围界面，以了解此处的问题。
  CJargs < -  lapply（df [，idCols]，unique）
 names（CJargs）< -  NULL 
 CJargs [[length（CJargs）+1]]  DT [do.call（CJ，CJargs） nomatch = NA] 
  
 
I want to do the same as explained here, i.e. adding missing rows to a data.table. The only additional difficulty I'm facing is that I want the number of key columns, i.e. those rows that are used for the self-join, to be flexible.

Here is a small example that basically repeats what is done in the link mentioned above:
df <- data.frame(fundID   = rep(letters[1:4], each=6),
                 cfType   = rep(c("D", "D", "T", "T", "R", "R"), times=4),
                 variable = rep(c(1,3), times=12),
                 value    = 1:24)
DT <- as.data.table(df)
idCols <- c("fundID", "cfType")
setkeyv(DT, c(idCols, "variable"))
DT[CJ(unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))), nomatch=NA]
What bothers me is the last line. I want idCols to be flexible (for instance if I use it within a function), so I don't want to type unique(df$fundID), unique(df$cfType) manually. However, I just don't find any workaround for this. All my attempts to automatically split the subset of df into vectors, as needed by CJ, fail with the error message Error in setkeyv(x, cols, verbose = verbose) : Column 'V1' is type 'list' which is not (currently) allowed as a key column type.
CJ(sapply(df[, idCols], unique))
CJ(unique(df[, idCols]))
CJ(as.vector(unique(df[, idCols])))
CJ(unique(DT[, idCols, with=FALSE]))
I also tried building the expression myself:
str <- ""
for (i in idCols) {
  str <- paste0(str, "unique(df$", i, "), ")
}
str <- paste0(str, "seq(from=min(variable), to=max(variable))")
str
[1] "unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))"
But then I don't know how to use str. This all fails:
CJ(eval(str))
CJ(substitute(str))
CJ(call(str))
Does anyone know a good workaround?
 解决方案 
I've never used the data.table package, so forgive me if I miss the mark here, but I think I've got it.  There's a lot going on here. Start by reading up on do.call, which allows you to evaluate any function in a sort of non-traditional manner where arguments are specified by a supplied list (where each element is in the list is positionally matched to the function arguments unless explicitly named).  Also notice that I had to specify min(df$variable) instead of just min(variable).  Read Hadley's page on scoping to get an idea of the issue here.
CJargs <- lapply(df[, idCols], unique)
names(CJargs) <- NULL
CJargs[[length(CJargs) +1]] <- seq(from=min(df$variable), to=max(df$variable))
DT[do.call("CJ", CJargs),nomatch=NA]


                        
这篇关于如何在data.table中使用未知数量的键列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在data.table中使用未知数量的键列 [英] How to use an unknown number of key columns in a data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在data.table中使用未知数量的键列 [英] How to use an unknown number of key columns in a data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭