R在特定子集中的滞后? [英] Lags in R within specific subsets?

查看:103
本文介绍了R在特定子集中的滞后?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我具有以下数据框:

Suppose I have the following dataframe:

df <- data.frame("yearmonth"=c("2005-01","2005-02","2005-03","2005-01","2005-02","2005-03"),"state"=c(1,1,1,2,2,2),"county"=c(3,3,3,3,3,3),"unemp"=c(4.0,3.6,1.4,3.7,6.5,5.4))

我正试图在每个独特的州县组合中为失业造成滞后.我最后要这样:

I'm trying to create a lag for unemployment within each unique state-county combination. I want to end up with this:

df2 <- data.frame("yearmonth"=c("2005-01","2005-02","2005-03","2005-01","2005-02","2005-03"),"state"=c(1,1,1,2,2,2),"county"=c(3,3,3,3,3,3),"unemp"=c(4.0,3.6,1.4,3.7,6.5,5.4),"unemp_lag"=c(NA,4.0,3.6,NA,3.7,6.5))

现在,想象一下这种情况,除了成千上万种不同的县-州组合以及几年的时间.我尝试使用滞后函数zoo.lag函数,但无法考虑州县代码. 一种可能是制作一个巨大的for循环,但是我认为这是太多数据了(R无法很好地处理for循环),我正在寻找一种更清洁的方法. 有任何想法吗?谢谢!

Now, imagine this situation except with thousands of different county-state combinations and over several years. I tried using the lag function, the zoo.lag function, but I couldn't make it take into account the state-county codes. One possibility is to make a giant for loop, but I think this is too much data (R does not handle for loops well) and I am looking for a cleaner way to do it. Any ideas? Thanks!

推荐答案

只是一种旧的基于R的方法:

Just an old style base R approach:

dsp <- split(df, list(df$state, df$county) )
dsp <- lapply(dsp, function(x) transform(x, unemp_lag =lag(unemp)))
dsp <- unsplit(dsp, list(df$state, df$county))
dsp
yearmonth state county unemp unemp_lag
1   2005-01     1      3   4.0        NA
2   2005-02     1      3   3.6       4.0
3   2005-03     1      3   1.4       3.6
4   2005-01     2      3   3.7        NA
5   2005-02     2      3   6.5       3.7
6   2005-03     2      3   5.4       6.5

编辑

我在解决方案中使用的

lag函数是dplyrlag(即使直到BlondedDust评论才意识到),这是一个 true 和真实的纯碱R解决方案(希望如此):

Edit

the lag function I used in my solution is the lag of dplyr (even though I didn't realized it until the BlondedDust comment) and here is a true and real pure base R solution (I hope):

dsp <- split(df, list(df$state, df$county) )
dsp <- lapply(dsp, function(x) transform(x, unemp_lag = c(NA, unemp[1:length(unemp)-1]) ) )
dsp <- unsplit(dsp, list(df$state, df$county))
dsp
  yearmonth state county unemp unemp_lag
1   2005-01     1      3   4.0        NA
2   2005-02     1      3   3.6       4.0
3   2005-03     1      3   1.4       3.6
4   2005-01     2      3   3.7        NA
5   2005-02     2      3   6.5       3.7
6   2005-03     2      3   5.4       6.5

这篇关于R在特定子集中的滞后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆