在函数中使用 data.table i 和 j 参数 [英] Using data.table i and j arguments in functions
问题描述
我正在尝试编写一些包装函数来减少与 data.table
的代码重复.
I am trying to write some wrapper functions to reduce code duplication with data.table
.
这是一个使用 mtcars
的示例.首先,设置一些数据:
Here is an example using mtcars
. First, set up some data:
library(data.table)
data(mtcars)
mtcars$car <- factor(gsub("(.*?) .*", "\1", rownames(mtcars)), ordered=TRUE)
mtcars <- data.table(mtcars)
现在,这是我通常会写的按组汇总计数的内容.在这种情况下,我按 car
分组:
Now, here is what I would usually write to get a summary of counts by group. In this case I am grouping by car
:
mtcars[, list(Total=length(mpg)), by="car"][order(car)]
car Total
AMC 1
Cadillac 1
Camaro 1
...
Toyota 2
Valiant 1
Volvo 1
复杂之处在于,由于参数 i
和 j
是在 data.table
的框架中计算的,因此必须使用eval(...)
如果你想传入变量:
The complication is that, since the arguments i
and j
are evaluated in the frame of the data.table
, one has to use eval(...)
if you want to pass in variables:
这行得通:
group <- "car"
mtcars[, list(Total=length(mpg)), by=eval(group)]
但现在我想按相同的分组变量对结果进行排序.我无法得到以下任何变体来给我正确的结果.注意我总是得到单行结果,而不是有序集.
But now I want to order the results by the same grouping variable. I can't get any variant of the following to give me correct results. Notice how I always get a single row of results, rather than the ordered set.
mtcars[, list(Total=length(mpg)), by=eval(group)][order(group)]
car Total
Mazda 2
我知道为什么:这是因为 group
是在 parent.frame
中评估的,而不是 data.table
的框架.
I know why: it's because group
is evaluated in the parent.frame
, not the frame of the data.table
.
如何在 data.table
的上下文中评估 group
?
How can I evaluate group
in the context of the data.table
?
更一般地说,我如何在函数中使用它?我需要以下函数来给我所有的结果,而不仅仅是第一行数据:
More generally, how can I use this inside a function? I need the following function to give me all the results, not just the first row of data:
tableOrder <- function(x, group){
x[, list(Total=length(mpg)), by=eval(group)][order(group)]
}
tableOrder(mtcars, "car")
推荐答案
Gavin 和 Josh 是对的.这个答案只是为了添加更多背景.这个想法是,您不仅可以将变量列名传递给这样的函数,还可以使用 quote()
将列名的表达式.
Gavin and Josh are right. This answer is only to add more background. The idea is that not only can you pass variable column names into a function like that, but expressions of column names, using quote()
.
group = quote(car)
mtcars[, list(Total=length(mpg)), by=group][order(group)]
group Total
AMC 1
Cadillac 1
...
Toyota 2
Valiant 1
Volvo 1
虽然,诚然,开始更难,但它可以更灵活.反正就是这个想法.在函数内部你需要 substitute()
,像这样:
Although, admitedly more difficult to start with, it can be more flexible. That's the idea, anyway. Inside functions you need substitute()
, like this :
tableOrder = function(x,.expr) {
.expr = substitute(.expr)
ans = x[,list(Total=length(mpg)),by=.expr]
setkeyv(ans, head(names(ans),-1)) # see below re feature request #1780
ans
}
tableOrder(mtcars, car)
.expr Total
AMC 1
Cadillac 1
Camaro 1
...
Toyota 2
Valiant 1
Volvo 1
tableOrder(mtcars, substring(car,1,1)) # an expression, not just a column name
.expr Total
[1,] A 1
[2,] C 3
[3,] D 3
...
[8,] P 2
[9,] T 2
[10,] V 2
tableOrder(mtcars, list(cyl,gear%%2)) # by two expressions, so head(,-1) above
cyl gear Total
[1,] 4 0 8
[2,] 4 1 3
[3,] 6 0 4
[4,] 6 1 3
[5,] 8 1 14
在 v1.8.0(2012 年 7 月)中添加了一个新参数 keyby
,使其更简单:
A new argument keyby
was added in v1.8.0 (July 2012) making it simpler :
tableOrder = function(x,.expr) {
.expr = substitute(.expr)
x[,list(Total=length(mpg)),keyby=.expr]
}
欢迎在 i
、j
和 by
变量表达式领域提出意见和反馈.您可以做的另一件事是有一个表,其中一列包含表达式,然后查找要放入 i
、j
或 by
的表达式从那个表.
Comments and feedback in the area of i
,j
and by
variable expressions are most welcome. The other thing you can do is have a table where a column contains expressions and then look up which expression to put in i
, j
or by
from that table.
这篇关于在函数中使用 data.table i 和 j 参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!