R:如何根据数据表的组计算多个列的滞后 [英] R: How to calculate lag for multiple columns by group for data table

查看:163
本文介绍了R:如何根据数据表的组计算多个列的滞后的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算数据表中变量的diff,按id分组。这里是一些示例数据。以1Hz的采样率记录数据。我想估计第一和第二导数(速度,加速度)

I would like to calculate the diff of variables in a data table, grouped by id. Here is some sample data. The data is recorded at a sample rate of 1 Hz. I would like to estimate the first and second derivatives (speed, acceleration)

df <- read.table(text='x y id
                 1 2 1
                 2 4 1
                 3 5 1
                 1 8 2
                 5 2 2
                 6 3 2',header=TRUE)
dt<-data.table(df)

预期输出

# dx dy id
# NA NA 1
# 1  2  1
# 1  1  1
# NA NA 2
# 4  -6  2
# 1 1    2

这里是我试过的

dx_dt<-dt[, diff:=c(NA,diff(dt[,'x',with=FALSE])),by = id]

输出

Error in `[.data.frame`(dt, , `:=`(diff, c(NA, diff(dt[, "x", with = FALSE]))),  : 
  unused argument (by = id)

如Akrun所指出的,速度项(dx,dy)可以使用数据表或plyr来获得,但是我不能很好地理解计算,以将其扩展到加速度项。那么,如何计算第二滞后项?

As pointed out by Akrun, the 'speed' terms (dx, dy) can be obtained using either data table or plyr. However, I'm unable to understand the calculation well enough to extend it to acceleration terms. So, how to calculate the 2nd lag terms?

dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))),
+ by=id]

生成

   x y id dx dy
1: 1 2  1 NA NA
2: 2 4  1  1  2
3: 3 5  1  1  1
4: 1 8  2 NA NA
5: 5 2  2  4 -6
6: 6 3  2  1  1

如何展开以获得第二个差异,或dx,dy? / p>

How to expand to get a second diff, or the diff of dx, dy?

   x y id dx dy  dx2  dy2
1: 1 2  1 NA NA   NA   NA
2: 2 4  1  1  2   NA   NA
3: 3 5  1  1  1    0   -1
4: 1 8  2 NA NA   NA   NA
5: 5 2  2  4 -6   NA   NA
6: 6 3  2  1  1   -3    7


推荐答案

可以尝试

 setnames(dt[, lapply(.SD, function(x) c(NA,diff(x))), by=id], 
                2:3, c('dx', 'dy'))[]
 #    id dx dy
  #1:  1 NA NA
  #2:  1  1  2
  #3:  1  1  1
  #4:  2 NA NA
  #5:  2  4 -6
  #6:  2  1  1

另一种选择是使用 dplyr

 library(dplyr)
 df %>% 
     group_by(id) %>%
     mutate_each(funs(c(NA,diff(.))))%>%
     rename(dx=x, dy=y)



更新



您可以重复此步骤两次

Update

You can repeat the step twice

dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))), by=id]
dt[,c('dx2', 'dy2'):= lapply(.SD, function(x) c(NA, diff(x))),
                                            by=id, .SDcols=4:5]
 dt
 #   x y id dx dy dx2 dy2
 #1: 1 2  1 NA NA  NA  NA
 #2: 2 4  1  1  2  NA  NA
 #3: 3 5  1  1  1   0  -1
 #4: 1 8  2 NA NA  NA  NA
 #5: 5 2  2  4 -6  NA  NA
 #6: 6 3  2  1  1  -3   7

或者我们可以使用 shift 函数从 data.table

Or we can use the shift function from data.table

dt[, paste0("d", c("x", "y")) := .SD - shift(.SD), by = id
  ][, paste0("d", c("x2", "y2")) := .SD - shift(.SD) , by =  id, .SDcols = 4:5 ]

这篇关于R:如何根据数据表的组计算多个列的滞后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆