不同组的dplyr滞后 [英] dplyr lag of different group

查看:36
本文介绍了不同组的dplyr滞后的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用dplyr来对包含变量的相同组滞后以及其他组(其中一个)的滞后的列进行突变.抱歉,在第一版中,我通过在最后一秒按日期重新排列有点弄乱了顺序.

I am trying to use dplyr to mutate both a column containing the samegroup lag of a variable as well as the lag of (one of) the other group(s). Sorry, in the first edition, I messed up the order a bit by rearranging by date at the last second.

这是我期望的结果:

这是一个最小的代码示例:

Here is a minimal code example:

library(tidyverse)

set.seed(2)
df <-
  data.frame(
    x =  sample(seq(as.Date('2000/01/01'), as.Date('2015/01/01'), by="day"), 10),
    group = sample(c("A","B"),10,replace = T),
    value = sample(1:10,size=10)
  ) %>% arrange(x)

df <- df %>%
  group_by(group) %>%
  mutate(own_lag = lag(value))


df %>% data.frame(other_lag = c(NA,1,2,7,7,9,10,10,8,6))

非常感谢!

推荐答案

具有:

library(data.table)

# to create own lag: 
setDT(df)[, own_lag:=c(NA, head(value, -1)), by=group]

# to create other group lag: (the function works actually outside of data.table, in base R, see N.B. below)
df[, other_lag:=sapply(1:.N, 
                       function(ind) {
                          gp_cur <- group[ind]
                          if(any(group[1:ind]!=gp_cur)) tail(value[1:ind][group[1:ind]!=gp_cur], 1) else NA
                       })]

df
 #            x group value own_lag other_lag
 #1: 2001-12-08     B     1      NA        NA
 #2: 2002-07-09     A     2      NA         1
 #3: 2002-10-10     B     7       1         2
 #4: 2007-01-04     A     5       2         7
 #5: 2008-03-27     A     9       5         7
 #6: 2008-08-06     B    10       7         9
 #7: 2010-07-15     A     4       9        10
 #8: 2012-06-27     A     8       4        10
 #9: 2014-02-21     B     6      10         8
#10: 2014-02-24     A     3       8         6

other_lag确定的解释:这个想法是,对于每个观察值,要查看组值,如果有任何组值不同于当前值,则在当前值之前,然后取最后一个值,否则放NA.

Explanation of other_lag determination: The idea is, for each observation, to look at the group value, if there is any group value different from current one, previous to current one, then take the last value, else, put NA.

N.B.:无需数据即可创建 other_lag .表:

N.B.: other_lag can be created without the need of data.table:

df$other_lag <- with(df, sapply(1:nrow(df), 
                                function(ind) {
                                 gp_cur <- group[ind]
                                 if(any(group[1:ind]!=gp_cur)) tail(value[1:ind][group[1:ind]!=gp_cur], 1) else NA
                               }))

这篇关于不同组的dplyr滞后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆