向包含许多变量的data.table添加新列 [英] Add new columns to a data.table containing many variables

查看：126 发布时间：2017/3/12 11:26:45 r data.table

本文介绍了向包含许多变量的data.table添加新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想根据分组计算同时向 data.table 添加许多新列。我的数据的工作示例如下所示：

I want to add many new columns simultaneously to a data.table based on by-group computations. A working example of my data would look something like this:

     Time     Stock x1 x2 x3
1: 2014-08-22     A 15 27 34
2: 2014-08-23     A 39 44 29
3: 2014-08-24     A 20 50  5
4: 2014-08-22     B 42 22 43
5: 2014-08-23     B 44 45 12
6: 2014-08-24     B  3 21  2

现在我想要 scale 和 sum 输出如：

         Time Stock x1 x2 x3   x2_scale   x3_scale x2_sum x3_sum
1: 2014-08-22     A 15 27 34 -1.1175975  0.7310560    121     68
2: 2014-08-23     A 39 44 29  0.3073393  0.4085313    121     68
3: 2014-08-24     A 20 50  5  0.8102582 -1.1395873    121     68
4: 2014-08-22     B 42 22 43 -0.5401315  1.1226726     88     57
5: 2014-08-23     B 44 45 12  1.1539172 -0.3274462     88     57
6: 2014-08-24     B  3 21  2 -0.6137858 -0.7952265     88     57

我的问题的强力实现将是：

A brute force implementation of my problem would be:

library(data.table)

set.seed(123)
d <- data.table(Time = rep(seq.Date( Sys.Date(), length=3, by="day" )),
                Stock = rep(LETTERS[1:2], each=3 ),
                x1 = sample(1:50, 6),
                x2 = sample(1:50, 6),
                x3 = sample(1:50, 6))

d[,x2_scale:=scale(x2),by=Stock]
d[,x3_scale:=scale(x3),by=Stock]
d[,x2_sum:=sum(x2),by=Stock]
d[,x3_sum:=sum(x3),by=Stock]

类似问题（向R数据添加多个列）。一个函数调用中的表格？和分配多个列使用：= in data.table，按组）建议以下解决方案：

Other posts describing a similar issue (Add multiple columns to R data.table in one function call? and assign multiple columns using := in data.table, by group) suggest the following solution:

  d[, c("x2_scale","x3_scale"):=list(scale(x2),scale(x3)), by=Stock]
  d[, c("x2_sum","x3_sum"):=list(sum(x2),sum(x3)), by=Stock]

但是，这将会变得非常混乱很多变量，并且会出现一个错误消息与 scale （但不是与 sum ，因为这不是返回一个向量）。

But again, this would get very messy with a lot of variables and also this brings up an error message with scale (but not with sum since this isn't returning a vector).

是否有更有效的方法来实现所需的结果（记住我的实际数据集非常大）？

Is there a more efficient way to achieve the required result (keeping in mind that my actual data set is quite large)?

推荐答案

我认为对你的最后一个代码进行一点小的修改，你可以很容易做任何变量你想要的

I think with a small modification to your last code you can easily do both for as many variables you want

vars <- c("x2", "x3") # <- Choose the variable you want to operate on

d[, paste0(vars, "_", "scale") := lapply(.SD, function(x) scale(x)[, 1]), .SDcols = vars, by = Stock]
d[, paste0(vars, "_", "sum") := lapply(.SD, sum), .SDcols = vars, by = Stock]

##          Time Stock x1 x2 x3   x2_scale   x3_scale x2_sum x3_sum
## 1: 2014-08-22     A 13 14 32 -1.1338934  1.1323092     87     44
## 2: 2014-08-23     A 25 39  9  0.7559289 -0.3701780     87     44
## 3: 2014-08-24     A 18 34  3  0.3779645 -0.7621312     87     44
## 4: 2014-08-22     B 44  8  6 -0.4730162 -0.7258662     59     32
## 5: 2014-08-23     B 49  3 18 -0.6757374  1.1406469     59     32
## 6: 2014-08-24     B 15 48  8  1.1487535 -0.4147807     59     32

对于简单的函数（不需要像 scale 这样的特殊处理），你可以轻松地执行类似

For simple functions (that don't need special treatment like scale) you could easily do something like

vars <- c("x2", "x3") # <- Define the variable you want to operate on
funs <- c("min", "max", "mean", "sum") # <- define your function
for(i in funs){
  d[, paste0(vars, "_", i) := lapply(.SD, eval(i)), .SDcols = vars, by = Stock] 
}

这篇关于向包含许多变量的data.table添加新列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

向包含许多变量的data.table添加新列 [英] Add new columns to a data.table containing many variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

向包含许多变量的data.table添加新列 [英] Add new columns to a data.table containing many variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭