具有多个变量输入的 R 自定义 data.table 函数 [英] R custom data.table function with multiple variable inputs
问题描述
我正在使用 data.table (v 1.9.6) 编写一个自定义聚合函数,并且很难将函数参数传递给它.对此也有类似的问题,但没有一个涉及多个(可变)输入,而且似乎没有一个结论性的答案,而是小技巧".
I am writing a custom aggregation function with data.table (v 1.9.6) and struggle to pass function arguments to it. there have been similar questions on this but none deals with multiple (variable) inputs and none seems to have a conclusive answer but rather "little hacks".
我想获取一个数据表总和并为定义的变量排序并在顶部创建新变量(2 个步骤).关键的想法是所有东西都应该参数化,即要求和的变量、要分组的变量、要排序的变量.它们都可以是一个或多个变量.一个小例子.
I would like to take a data table sum and order defined variables and create new variables on top (2 steps). the crucial think is that everything should be parameterized i.e. variables to sum, variables to group by, variables to order by. and they can all be one or more variables. a small example.
dt <- data.table(a=rep(letters[1:4], 5),
b=rep(letters[5:8], 5),
c=rep(letters[3:6], 5),
x=sample(1:100, 20),
y=sample(1:100, 20),
z=sample(1:100, 20))
temp <-
dt[, .(x_sum = sum(x, na.rm = T),
y_sum = sum(y, na.rm = T)),
by = .(a, b)][order(a, b)]
temp2 <-
temp[, `:=` (x_sum_del = (x_sum - shift(x = x_sum, n = 1, type = "lag")),
y_sum_del = (y_sum - shift(x = y_sum, n = 1, type = "lag")),
x_sum_del_rel = ((x_sum - shift(x = x_sum, n = 1, type = "lag")) /
(shift(x = x_sum, n = 1, type = "lag"))),
y_sum_del_rel = ((y_sum - shift(x = y_sum, n = 1, type = "lag")) /
(shift(x = y_sum, n = 1, type = "lag")))
)
]
如何以编程方式传递以下函数参数(即不是单个输入,而是向量/输入列表):
how to programmatically pass following function arguments (i.e. not single inputs but vectors/list of inputs):
- x 和 y --> var_list
- x 和 y 的新名称(例如 x_sum、y_sum)--> var_name_list
- 按参数 a、b 分组 --> by_var_list
- 按参数 a、b 排序 --> order_var_list
- temp 2 应该适用于所有预定义的参数,我也在考虑使用 apply 函数,但再次难以传递变量列表.
我已经尝试过 get()、as.name()、eval()、quote() 的变体,但是一旦我传递了多个变量,它们就不再起作用了.我希望问题很清楚,否则我很乐意在您认为必要的地方进行调整.函数调用如下所示:
I have played around with variations of get(), as.name(), eval(), quote() but as soon as I pass more than one variable, they don't work anymore. I hope the question is clear, otherwise I am happy to adjust where you deem necessary. a function call would look as follows:
fn_agg(dt, var_list, var_name_list, by_var_list, order_var_list)
推荐答案
这里有一个使用 mget
的选项,评论如下:
Here's an option using mget
, as commented:
fn_agg <- function(DT, var_list, var_name_list, by_var_list, order_var_list) {
temp <- DT[, setNames(lapply(.SD, sum, na.rm = TRUE), var_name_list),
by = by_var_list, .SDcols = var_list]
setorderv(temp, order_var_list)
cols1 <- paste0(var_name_list, "_del")
cols2 <- paste0(cols1, "_rel")
temp[, (cols1) := lapply(mget(var_name_list), function(x) {
x - shift(x, n = 1, type = "lag")
})]
temp[, (cols2) := lapply(mget(var_name_list), function(x) {
xshift <- shift(x, n = 1, type = "lag")
(x - xshift) / xshift
})]
temp[]
}
fn_agg(dt,
var_list = c("x", "y"),
var_name_list = c("x_sum", "y_sum"),
by_var_list = c("a", "b"),
order_var_list = c("a", "b"))
# a b x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
#1: a e 254 358 NA NA NA NA
#2: b f 246 116 -8 -242 -0.031496063 -0.6759777
#3: c g 272 242 26 126 0.105691057 1.0862069
#4: d h 273 194 1 -48 0.003676471 -0.1983471
除了 mget
,您还可以使用 data.table
的 .SDcols
参数,如
Instead of mget
, you could also make use of data.table
's .SDcols
argument as in
temp[, (cols1) := lapply(.SD, function(x) {
x - shift(x, n = 1, type = "lag")
}), .SDcols = var_name_list]
另外,可能有一些方法可以通过避免 shift(x, n = 1, type = "lag")
的重复计算来改进函数,但我只想演示一种使用数据的方法.table 函数.
Also, there are probably ways to improve the function by avoiding duplicated computation of shift(x, n = 1, type = "lag")
but I only wanted to demonstrate a way to use data.table in functions.
这篇关于具有多个变量输入的 R 自定义 data.table 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!