具有多个变量输入的 R 自定义 data.table 函数 [英] R custom data.table function with multiple variable inputs

查看:15
本文介绍了具有多个变量输入的 R 自定义 data.table 函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 data.table (v 1.9.6) 编写一个自定义聚合函数,并且很难将函数参数传递给它.对此也有类似的问题,但没有一个涉及多个(可变)输入,而且似乎没有一个结论性的答案,而是小技巧".

I am writing a custom aggregation function with data.table (v 1.9.6) and struggle to pass function arguments to it. there have been similar questions on this but none deals with multiple (variable) inputs and none seems to have a conclusive answer but rather "little hacks".

  1. 将变量和名称传递给 data.table 函数
  2. 在 data.table 中评估和引用
  3. 如何在 R 中的 data.table 中使用变量中的列名完全通用地工作

我想获取一个数据表总和并为定义的变量排序并在顶部创建新变量(2 个步骤).关键的想法是所有东西都应该参数化,即要求和的变量、要分组的变量、要排序的变量.它们都可以是一个或多个变量.一个小例子.

I would like to take a data table sum and order defined variables and create new variables on top (2 steps). the crucial think is that everything should be parameterized i.e. variables to sum, variables to group by, variables to order by. and they can all be one or more variables. a small example.

dt <- data.table(a=rep(letters[1:4], 5), 
                 b=rep(letters[5:8], 5),
                 c=rep(letters[3:6], 5),
                 x=sample(1:100, 20),
                 y=sample(1:100, 20),
                 z=sample(1:100, 20))

temp <- 
  dt[, .(x_sum = sum(x, na.rm = T),
         y_sum = sum(y, na.rm = T)),
     by = .(a, b)][order(a, b)]

temp2 <- 
  temp[, `:=` (x_sum_del = (x_sum - shift(x = x_sum, n = 1, type = "lag")),
               y_sum_del = (y_sum - shift(x = y_sum, n = 1, type = "lag")),
               x_sum_del_rel = ((x_sum - shift(x = x_sum, n = 1, type = "lag")) /
                                  (shift(x = x_sum, n = 1, type = "lag"))),
               y_sum_del_rel = ((y_sum - shift(x = y_sum, n = 1, type = "lag")) /
                                  (shift(x = y_sum, n = 1, type = "lag")))
               )
       ]

如何以编程方式传递以下函数参数(即不是单个输入,而是向量/输入列表):

how to programmatically pass following function arguments (i.e. not single inputs but vectors/list of inputs):

  • x 和 y --> var_list
  • x 和 y 的新名称(例如 x_sum、y_sum)--> var_name_list
  • 按参数 a、b 分组 --> by_var_list
  • 按参数 a、b 排序 --> order_var_list
  • temp 2 应该适用于所有预定义的参数,我也在考虑使用 apply 函数,但再次难以传递变量列表.

我已经尝试过 get()、as.name()、eval()、quote() 的变体,但是一旦我传递了多个变量,它们就不再起作用了.我希望问题很清楚,否则我很乐意在您认为必要的地方进行调整.函数调用如下所示:

I have played around with variations of get(), as.name(), eval(), quote() but as soon as I pass more than one variable, they don't work anymore. I hope the question is clear, otherwise I am happy to adjust where you deem necessary. a function call would look as follows:

fn_agg(dt, var_list, var_name_list, by_var_list, order_var_list)

推荐答案

这里有一个使用 mget 的选项,评论如下:

Here's an option using mget, as commented:

fn_agg <- function(DT, var_list, var_name_list, by_var_list, order_var_list) {

  temp <- DT[, setNames(lapply(.SD, sum, na.rm = TRUE), var_name_list), 
             by = by_var_list, .SDcols = var_list]

  setorderv(temp, order_var_list)

  cols1 <- paste0(var_name_list, "_del")
  cols2 <- paste0(cols1, "_rel")

  temp[, (cols1) := lapply(mget(var_name_list), function(x) {
    x - shift(x, n = 1, type = "lag")
  })]

  temp[, (cols2) := lapply(mget(var_name_list), function(x) {
    xshift <- shift(x, n = 1, type = "lag")
    (x - xshift) / xshift
  })]

  temp[]
}

fn_agg(dt, 
       var_list = c("x", "y"), 
       var_name_list = c("x_sum", "y_sum"), 
       by_var_list = c("a", "b"), 
       order_var_list = c("a", "b"))

#   a b x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
#1: a e   254   358        NA        NA            NA            NA
#2: b f   246   116        -8      -242  -0.031496063    -0.6759777
#3: c g   272   242        26       126   0.105691057     1.0862069
#4: d h   273   194         1       -48   0.003676471    -0.1983471

除了 mget,您还可以使用 data.table.SDcols 参数,如

Instead of mget, you could also make use of data.table's .SDcols argument as in

temp[, (cols1) := lapply(.SD, function(x) {
    x - shift(x, n = 1, type = "lag")
  }), .SDcols = var_name_list]

另外,可能有一些方法可以通过避免 shift(x, n = 1, type = "lag") 的重复计算来改进函数,但我只想演示一种使用数据的方法.table 函数.

Also, there are probably ways to improve the function by avoiding duplicated computation of shift(x, n = 1, type = "lag") but I only wanted to demonstrate a way to use data.table in functions.

这篇关于具有多个变量输入的 R 自定义 data.table 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆