调试功能:为多列创建多个滞后(dplyr) [英] debugging: function to create multiple lags for multiple columns (dplyr)

查看:82
本文介绍了调试功能:为多列创建多个滞后(dplyr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建多个变量的多个滞后,所以我认为编写一个函数将是有帮助的。我的代码抛出一个警告(截断向量到长度1)和错误的结果:

I want to create multiple lags of multiple variables, so I thought writing a function would be helpful. My code throws a warning ("Truncating vector to length 1 ") and false results:

library(dplyr)
time <- c(2000:2009, 2000:2009)
x <- c(1:10, 10:19)
id <- c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
df <- data.frame(id, time, x)



three_lags <- function (data, column, group, ordervar) {
  data <- data %>% 
    group_by_(group) %>%
    mutate(a = lag(column, 1L, NA, order_by = ordervar),
            b = lag(column, 2L, NA, order_by = ordervar),
            c = lag(column, 3L, NA, order_by = ordervar)) 
  }

df_lags <- three_lags(data=df, column=x, group=id, ordervar=time) %>%
  arrange(id, time)

另外我想知道是否有更优雅解决方案使用 mutate_each ,但我也没有这样工作。我当然可以为每个新的滞后变量写一个长代码,但是Id喜欢避免这种变化。

Also I wondered if there might be a more elegant solution using mutate_each, but I didn't get that to work either. I can of course just write a long code with a line for each new lagged variable, but Id like to avoid that.

编辑:

akrun的dplyr应答工作,但需要很长时间来计算大数据帧。使用 data.table 的解决方案似乎更有效率。所以一个dplyr或其他解决方案也可以实现几列几个滞后仍然被发现。

akrun's dplyr answer works, but takes a long time to compute for large data frames. The solution using data.table seems to be more efficient. So a dplyr or other solution that also allows the be implemented for several columns & several lags is still to be found.

编辑2:

对于多个列,没有组(例如 ID)以下解决方案似乎非常适合我,由于其简单性。代码当然可以缩短,但是一步一步地:

For multiple columns and no groups (e.g. "ID") the following solution seems very well suited to me, due to its simplicity. The code may of course be shortened, but step by step:

df <- arrange(df, time)

df.lag <- shift(df[,1:24], n=1:3, give.names = T)  ##column indexes of columns to be lagged as "[,startcol:endcol]", "n=1:3" sepcifies the number of lags (lag1, lag2 and lag3 in this case)

df.result <- bind_cols(df, df.lag)


推荐答案

我们可以使用 shift from data.table 可以为'n'采取多个值

We can use shift from data.table which can take multiple values for 'n'

library(data.table)
setDT(df)[order(time), c("a", "b", "c") := shift(x, 1:3) , id][order(id, time)]

假设我们需要在多个列上执行

Suppose, we need to do this on multiple columns

df$y <- df$x
setDT(df)[order(time), paste0(rep(c("x", "y"), each =3), 
                c("a", "b", "c")) :=shift(.SD, 1:3), id, .SDcols = x:y]






shift 也可以在 dplyr中使用

library(dplyr)
df %>% 
  group_by(id) %>% 
  arrange(id, time) %>% 
  do(data.frame(., setNames(shift(.$x, 1:3), c("a", "b", "c"))))
#    id  time     x     a     b     c
#   <dbl> <int> <int> <int> <int> <int>
#1      1  2000     1    NA    NA    NA
#2      1  2001     2     1    NA    NA
#3      1  2002     3     2     1    NA
#4      1  2003     4     3     2     1
#5      1  2004     5     4     3     2
#6      1  2005     6     5     4     3
#7      1  2006     7     6     5     4
#8      1  2007     8     7     6     5
#9      1  2008     9     8     7     6
#10     1  2009    10     9     8     7
#11     2  2000    10    NA    NA    NA
#12     2  2001    11    10    NA    NA
#13     2  2002    12    11    10    NA
#14     2  2003    13    12    11    10
#15     2  2004    14    13    12    11
#16     2  2005    15    14    13    12
#17     2  2006    16    15    14    13
#18     2  2007    17    16    15    14
#19     2  2008    18    17    16    15
#20     2  2009    19    18    17    16

这篇关于调试功能:为多列创建多个滞后(dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆