不同组的dplyr滞后 [英] dplyr lag of different group
问题描述
我正在尝试使用dplyr来对包含变量的相同组滞后以及其他组(其中一个)的滞后的列进行突变.抱歉,在第一版中,我通过在最后一秒按日期重新排列有点弄乱了顺序.
I am trying to use dplyr to mutate both a column containing the samegroup lag of a variable as well as the lag of (one of) the other group(s). Sorry, in the first edition, I messed up the order a bit by rearranging by date at the last second.
这是我期望的结果:
这是一个最小的代码示例:
Here is a minimal code example:
library(tidyverse)
set.seed(2)
df <-
data.frame(
x = sample(seq(as.Date('2000/01/01'), as.Date('2015/01/01'), by="day"), 10),
group = sample(c("A","B"),10,replace = T),
value = sample(1:10,size=10)
) %>% arrange(x)
df <- df %>%
group_by(group) %>%
mutate(own_lag = lag(value))
df %>% data.frame(other_lag = c(NA,1,2,7,7,9,10,10,8,6))
非常感谢!
推荐答案
具有 data.table :
library(data.table)
# to create own lag:
setDT(df)[, own_lag:=c(NA, head(value, -1)), by=group]
# to create other group lag: (the function works actually outside of data.table, in base R, see N.B. below)
df[, other_lag:=sapply(1:.N,
function(ind) {
gp_cur <- group[ind]
if(any(group[1:ind]!=gp_cur)) tail(value[1:ind][group[1:ind]!=gp_cur], 1) else NA
})]
df
# x group value own_lag other_lag
#1: 2001-12-08 B 1 NA NA
#2: 2002-07-09 A 2 NA 1
#3: 2002-10-10 B 7 1 2
#4: 2007-01-04 A 5 2 7
#5: 2008-03-27 A 9 5 7
#6: 2008-08-06 B 10 7 9
#7: 2010-07-15 A 4 9 10
#8: 2012-06-27 A 8 4 10
#9: 2014-02-21 B 6 10 8
#10: 2014-02-24 A 3 8 6
other_lag确定的解释:这个想法是,对于每个观察值,要查看组值,如果有任何组值不同于当前值,则在当前值之前,然后取最后一个值,否则放NA.
Explanation of other_lag determination: The idea is, for each observation, to look at the group value, if there is any group value different from current one, previous to current one, then take the last value, else, put NA.
N.B.:无需数据即可创建 other_lag
.表:
N.B.: other_lag
can be created without the need of data.table:
df$other_lag <- with(df, sapply(1:nrow(df),
function(ind) {
gp_cur <- group[ind]
if(any(group[1:ind]!=gp_cur)) tail(value[1:ind][group[1:ind]!=gp_cur], 1) else NA
}))
这篇关于不同组的dplyr滞后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!