在R中创建超前和滞后变量 [英] Create lead and lag variables in R

查看:1268
本文介绍了在R中创建超前和滞后变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须像下面的R中那样创建超前和滞后变量

假设我有一个数据框,其中包含有关客户对任何商店的访问的详细信息...

CustomerID  Dateofvisit
1   1/2/2013
1   1/3/2013
1   1/7/2013
2   1/9/2013
2   1/14/2013
2   2/14/2013
3   1/4/2013
3   1/5/2013

我们可以看到,有3个客户的访问日期不同..当我在上面应用滞后函数...(我创建了自己的函数)..它变成如下所示:

CustomerID  Dateofvisit Laggeddate
1   1/2/2013    -
1   1/3/2013         1/2/2013
1   1/7/2013         1/3/2013
2   1/9/2013         1/7/2013
2   1/14/2013        1/9/2013
2   2/14/2013        1/14/2013
3   1/4/2013         2/14/2013
3   1/5/2013         1/4/2013

但是,我也想落后于客户.因此,对于第4行,滞后日期应该为空..与第3 cstomer相似,第一行/条目应为notnothng,最后一行为2013/1/4. >

以下是我用于滞后/超前的代码

shift<-function(x,shift_by){ 
    stopifnot(is.numeric(shift_by)) 
    stopifnot(is.numeric(x)) 

    if (length(shift_by)>1) 
        return(sapply(shift_by,shift, x=x)) 

    out<-NULL
    abs_shift_by=abs(shift_by) 
    if (shift_by > 0 ) 
        out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by)) 
    else if (shift_by < 0 ) 
        out<-c(rep(NA,abs_shift_by), head(x,-abs_shift_by)) 
    else 
        out<-x 
    out 
}

以及我如何领导/落后他们:

#generate lead by 1 variable 
test$df_lead2<-shift(test$x,1) 
#generate lag by 1 variable 
test$df_lag2<-shift(test$x,-1) 

我想要的输出是:

CustomerID  Dateofvisit Laggeddate
1   1/2/2013    -
1   1/3/2013         1/2/2013
1   1/7/2013         1/3/2013
2   1/9/2013         -
2   1/14/2013        1/9/2013
2   2/14/2013        1/14/2013
3   1/4/2013         -
3   1/5/2013         1/4/2013

解决方案

这是您想要的吗?

library(plyr)
ddply(.data = df, .variables = .(CustomerID), mutate,
   lagdate = c(NA, head(Dateofvisit, -1)),
   leaddate = c(tail(Dateofvisit, -1), NA))

I have to create lead and lag variables like below in R

Suppose i have a dataframe which has details about a customer's visit to any store...

CustomerID  Dateofvisit
1   1/2/2013
1   1/3/2013
1   1/7/2013
2   1/9/2013
2   1/14/2013
2   2/14/2013
3   1/4/2013
3   1/5/2013

As we can see, there are 3 customers with different visit dates.. When i apply a lag function on the above...(i created my own function,)..it becomes like below:

CustomerID  Dateofvisit Laggeddate
1   1/2/2013    -
1   1/3/2013         1/2/2013
1   1/7/2013         1/3/2013
2   1/9/2013         1/7/2013
2   1/14/2013        1/9/2013
2   2/14/2013        1/14/2013
3   1/4/2013         2/14/2013
3   1/5/2013         1/4/2013

But, i want to lag by customer as well. So for the 4th row, the lagged date should be nothing..similarly for the 3rd cstomer, first row/entry should be notihng and on last row, i should see 1/4/2013.. How do i do this?

The following is code i use for lag/lead

shift<-function(x,shift_by){ 
    stopifnot(is.numeric(shift_by)) 
    stopifnot(is.numeric(x)) 

    if (length(shift_by)>1) 
        return(sapply(shift_by,shift, x=x)) 

    out<-NULL
    abs_shift_by=abs(shift_by) 
    if (shift_by > 0 ) 
        out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by)) 
    else if (shift_by < 0 ) 
        out<-c(rep(NA,abs_shift_by), head(x,-abs_shift_by)) 
    else 
        out<-x 
    out 
}

and how i lead/lag them:

#generate lead by 1 variable 
test$df_lead2<-shift(test$x,1) 
#generate lag by 1 variable 
test$df_lag2<-shift(test$x,-1) 

My desired output is:

CustomerID  Dateofvisit Laggeddate
1   1/2/2013    -
1   1/3/2013         1/2/2013
1   1/7/2013         1/3/2013
2   1/9/2013         -
2   1/14/2013        1/9/2013
2   2/14/2013        1/14/2013
3   1/4/2013         -
3   1/5/2013         1/4/2013

解决方案

Is this what you want?

library(plyr)
ddply(.data = df, .variables = .(CustomerID), mutate,
   lagdate = c(NA, head(Dateofvisit, -1)),
   leaddate = c(tail(Dateofvisit, -1), NA))

这篇关于在R中创建超前和滞后变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆