计算data.table中连续分组列之间的差异 [英] Calculate the difference between consecutive, grouped columns in a data.table

查看：98 发布时间：2017/3/12 12:19:29 r data.table

本文介绍了计算data.table中连续分组列之间的差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的资料结构如下：

DT <- data.table(Id=c(1,2,3,4,5), Va1=c(3,13,NA,NA,NA), Va2=c(4,40,NA,NA,4), Va3=c(5,34,NA,7,84),
Va4=c(2,23,NA,63,9), Vb1=c(8,45,1,7,0), Vb2=c(0,35,0,7,6), Vb3=c(63,0,0,0,5), Vc1=c(2,5,0,0,4))
>DT
   Id Va1 Va2 Va3 Va4 Vb1 Vb2 Vb3 Vc1
1:  1   3   4   5   2   8   0  63   2
2:  2  13  40  34  23  45  35   0   5
3:  3  NA  NA  NA  NA   1   0   0   0
4:  4  NA  NA   7  63   7   7   0   0
5:  5  NA   4  84   9   0   6   5   4

另外，我有一个引用列表，引用所有列组：

additionally, I have a reference list that references all the column groups:

reference <- list(g.1=c(2,3,4,5), g.2=c(6,7,8), g.3=c(9))

列2,3,4,5（变量 Va1 ， Va2 ， Va3 和 Va4 ）属于一组变量。列6,7,8（变量 Vb1 ， Vb2 ， Vb3 ）属于第二组。第9列（变量 Vc1 ）属于第三组。

Columns 2,3,4,5 (variables Va1, Va2, Va3, and Va4) belong to one group of variables. Columns 6,7,8 (variables Vb1, Vb2, Vb3) belong to a second group. Column 9 (variable Vc1) belongs to a third group.

我需要做的是计算列组中的连续列。

What I need to do is calculate the difference between consecutive columns within column groups.

我需要找到Va2和Va1之间的差异，以及Va3和Va2之间的差异，但是在Vb1和Va4之间不。

I.e. I need to find the difference between Va2 and Va1, and between Va3 and Va2, etc... but not between Vb1 and Va4.

输出应为：

   Id Va1 Va2 Va3 Va4 Vb1 Vb2 Vb3 Vc1 D[Va1:Va2] D[Va2:Va3] D[Va3:Va4] D[Vb1:Vb2] D[Vb2:Vb3]
1:  1   3   4   5   2   8   0  63   2          1          1         -3         -8         63
2:  2  13  40  34  23  45  35   0   5         27         -6        -11        -10        -35
3:  3  NA  NA  NA  NA   1   0   0   0         NA         NA         NA         -1          0
4:  4  NA  NA   7  63   7   7   0   0         NA         NA         56          0         -7
5:  5  NA   4  84   9   0   6   5   4         NA         80        -75          6         -1

目前我正在使用以下循环：

Currently I am using the following loop:

  for(i in 1:(length(reference)-1)){
    tmp <- NULL
    tmp <- as.list(reference[[i]])
    tmp <- tmp[-length(tmp)]
    tmp <- mapply(c, lapply(tmp, FUN = function(x) x+1), tmp, SIMPLIFY=FALSE)
    for(j in 1:length(tmp)){
      data <- cbind(data, delta = data[, tmp[[j]][1], with = F] - data[, tmp[[j]][2], with = F])
    }
  }

我的实际数据表有300-500列和+ 1'000'000行。

but my real data.table has 300-500 columns and +1'000'000 rows.

我如何使这更高效？ / p>

How can I make this more efficient?

计算data.table中连续分组列之间的差异 [英] Calculate the difference between consecutive, grouped columns in a data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算data.table中连续分组列之间的差异 [英] Calculate the difference between consecutive, grouped columns in a data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭