R自定义data.table函数具有多个变量输入 [英] R custom data.table function with multiple variable inputs

查看:973
本文介绍了R自定义data.table函数具有多个变量输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用data.table(v 1.9.6)编写了一个自定义聚合函数,并且很难传递函数参数。有关于这一点的类似问题,但没有处理多个(可变)输入,没有一个似乎有一个结论性的答案,而是小黑客。


  1. 将变量和名称传递给data.table函数

  2. eval and quote in data.table

  3. 如何一个完全一般工作在data.table in R with column names in variables

我想对数据表求和并定义变量并创建新的变量在顶部(2个步骤)。关键的想法是,一切都应参数化,即变量总和,变量分组,变量排序。并且它们都可以是一个或多个变量。一个小例子。

  dt<  -  data.table(a = rep(letters [1:4],5) 
b = rep(letters [5:8],5),
c = rep(letters [3:6],5),
x = sample(1:100,20) $ by = sample(1:100,20),
z = sample(1:100,20))

temp < -
dt [,。 (x,na.rm = T),
y_sum = sum(y,na.rm = T)),
by =(a,b) b
$ b temp2 < -
temp [,`:=`(x_sum_del =(x_sum - shift(x = x_sum,n = 1,type =lag)),
y_sum_del =(y_sum_fift(x = y_sum,n = 1,type =lag)),
x_sum_del_rel = /
(shift(x = x_sum,n = 1,type =lag))),
y_sum_del_rel =((y_sum-shift(x = y_sum,n = 1,type =lag ))/
(shift(x = y_sum,n = 1,type =lag)))

]

如何以编程方式传递下面的函数参数(即不是单个输入,而是向量/输入列表):




  • x和y - > var_list

  • x和y的新名称(例如x_sum,y_sum) - > var_name_list

  • 按参数分组a,b - > by_var_list

  • 通过参数a,b - > order_var_list

  • temp 2应该适用于所有预定义的参数,我还在考虑使用apply函数,但是再次努力传递一个变量列表。



我玩过get(),as.name(),eval(),quote因为我传递多个变量,他们不工作了。我希望问题很清楚,否则我很乐意调整你认为必要的地方。函数调用如下:

  fn_agg(dt,var_list,var_name_list,by_var_list,order_var_list)

解决方案

这里有一个使用 mget ,如注释:

  fn_agg<  -  function(DT,var_list,var_name_list,by_var_list,order_var_list){

temp< - DT [,setNames(lapply(.SD,sum,na.rm = TRUE),var_name_list),
by = by_var_list,.SDcols = var_list]
$ b b setorderv(temp,order_var_list)

cols1< - paste0(var_name_list,_del)
cols2< - paste0(cols1,_rel)
$ b b temp [,(cols1):= lapply(mget(var_name_list),function(x){
x - shift(x,n = 1,type =lag)
})] b
temp [,(cols2):= lapply(mget(var_name_list),function(x){
xshift←shift(x,n = 1,type =lag)
(x - xshift)/ xshift
})]

temp []
}

fn_agg(dt,
var_list = c (x,y),
var_name_list = c(x_sum,y_sum),
by_var_list = c(a,b),
order_var_list = c(a,b))

#ab x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
#1:ae 254 358 NA NA NA NA
#2:bf 246 116 -8 -242 -0.031496063 -0.6759777
#3:cg 272 242 26 126 0.105691057 1.0862069
#4:dh 273 194 1 -48 0.003676471 -0.1983471

而不是 mget ,您还可以使用 data.table .SDcols 参数

  temp [,(cols1):= lapply(.SD,function(x){
x - shift(x,n = 1,type =lag)
}),.SDcols = var_name_list]

此外,还有一些方法可以通过避免重复计算 shift(x,n = 1,type =lag)但我只想演示一种在函数中使用data.table的方法。


I am writing a custom aggregation function with data.table (v 1.9.6) and struggle to pass function arguments to it. there have been similar questions on this but none deals with multiple (variable) inputs and none seems to have a conclusive answer but rather "little hacks".

  1. pass variables and names to data.table function
  2. eval and quote in data.table
  3. How can one work fully generically in data.table in R with column names in variables

I would like to take a data table sum and order defined variables and create new variables on top (2 steps). the crucial think is that everything should be parameterized i.e. variables to sum, variables to group by, variables to order by. and they can all be one or more variables. a small example.

dt <- data.table(a=rep(letters[1:4], 5), 
                 b=rep(letters[5:8], 5),
                 c=rep(letters[3:6], 5),
                 x=sample(1:100, 20),
                 y=sample(1:100, 20),
                 z=sample(1:100, 20))

temp <- 
  dt[, .(x_sum = sum(x, na.rm = T),
         y_sum = sum(y, na.rm = T)),
     by = .(a, b)][order(a, b)]

temp2 <- 
  temp[, `:=` (x_sum_del = (x_sum - shift(x = x_sum, n = 1, type = "lag")),
               y_sum_del = (y_sum - shift(x = y_sum, n = 1, type = "lag")),
               x_sum_del_rel = ((x_sum - shift(x = x_sum, n = 1, type = "lag")) /
                                  (shift(x = x_sum, n = 1, type = "lag"))),
               y_sum_del_rel = ((y_sum - shift(x = y_sum, n = 1, type = "lag")) /
                                  (shift(x = y_sum, n = 1, type = "lag")))
               )
       ]

how to programmatically pass following function arguments (i.e. not single inputs but vectors/list of inputs):

  • x and y --> var_list
  • new names of x and y (e.g. x_sum, y_sum) --> var_name_list
  • group by arguments a, b --> by_var_list
  • order by arguments a, b --> order_var_list
  • temp 2 should work on all pre-defined parameters, I was also thinking about using an apply function but again struggled to pass a list of variables.

I have played around with variations of get(), as.name(), eval(), quote() but as soon as I pass more than one variable, they don't work anymore. I hope the question is clear, otherwise I am happy to adjust where you deem necessary. a function call would look as follows:

fn_agg(dt, var_list, var_name_list, by_var_list, order_var_list)

解决方案

Here's an option using mget, as commented:

fn_agg <- function(DT, var_list, var_name_list, by_var_list, order_var_list) {

  temp <- DT[, setNames(lapply(.SD, sum, na.rm = TRUE), var_name_list), 
             by = by_var_list, .SDcols = var_list]

  setorderv(temp, order_var_list)

  cols1 <- paste0(var_name_list, "_del")
  cols2 <- paste0(cols1, "_rel")

  temp[, (cols1) := lapply(mget(var_name_list), function(x) {
    x - shift(x, n = 1, type = "lag")
  })]

  temp[, (cols2) := lapply(mget(var_name_list), function(x) {
    xshift <- shift(x, n = 1, type = "lag")
    (x - xshift) / xshift
  })]

  temp[]
}

fn_agg(dt, 
       var_list = c("x", "y"), 
       var_name_list = c("x_sum", "y_sum"), 
       by_var_list = c("a", "b"), 
       order_var_list = c("a", "b"))

#   a b x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
#1: a e   254   358        NA        NA            NA            NA
#2: b f   246   116        -8      -242  -0.031496063    -0.6759777
#3: c g   272   242        26       126   0.105691057     1.0862069
#4: d h   273   194         1       -48   0.003676471    -0.1983471

Instead of mget, you could also make use of data.table's .SDcols argument as in

temp[, (cols1) := lapply(.SD, function(x) {
    x - shift(x, n = 1, type = "lag")
  }), .SDcols = var_name_list]

Also, there are probably ways to improve the function by avoiding duplicated computation of shift(x, n = 1, type = "lag") but I only wanted to demonstrate a way to use data.table in functions.

这篇关于R自定义data.table函数具有多个变量输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆