R在特定子集中的滞后? [英] Lags in R within specific subsets?
问题描述
假设我具有以下数据框:
Suppose I have the following dataframe:
df <- data.frame("yearmonth"=c("2005-01","2005-02","2005-03","2005-01","2005-02","2005-03"),"state"=c(1,1,1,2,2,2),"county"=c(3,3,3,3,3,3),"unemp"=c(4.0,3.6,1.4,3.7,6.5,5.4))
我正试图在每个独特的州县组合中为失业造成滞后.我最后要这样:
I'm trying to create a lag for unemployment within each unique state-county combination. I want to end up with this:
df2 <- data.frame("yearmonth"=c("2005-01","2005-02","2005-03","2005-01","2005-02","2005-03"),"state"=c(1,1,1,2,2,2),"county"=c(3,3,3,3,3,3),"unemp"=c(4.0,3.6,1.4,3.7,6.5,5.4),"unemp_lag"=c(NA,4.0,3.6,NA,3.7,6.5))
现在,想象一下这种情况,除了成千上万种不同的县-州组合以及几年的时间.我尝试使用滞后函数zoo.lag函数,但无法考虑州县代码. 一种可能是制作一个巨大的for循环,但是我认为这是太多数据了(R无法很好地处理for循环),我正在寻找一种更清洁的方法. 有任何想法吗?谢谢!
Now, imagine this situation except with thousands of different county-state combinations and over several years. I tried using the lag function, the zoo.lag function, but I couldn't make it take into account the state-county codes. One possibility is to make a giant for loop, but I think this is too much data (R does not handle for loops well) and I am looking for a cleaner way to do it. Any ideas? Thanks!
推荐答案
只是一种旧的基于R的方法:
Just an old style base R approach:
dsp <- split(df, list(df$state, df$county) )
dsp <- lapply(dsp, function(x) transform(x, unemp_lag =lag(unemp)))
dsp <- unsplit(dsp, list(df$state, df$county))
dsp
yearmonth state county unemp unemp_lag
1 2005-01 1 3 4.0 NA
2 2005-02 1 3 3.6 4.0
3 2005-03 1 3 1.4 3.6
4 2005-01 2 3 3.7 NA
5 2005-02 2 3 6.5 3.7
6 2005-03 2 3 5.4 6.5
编辑
我在解决方案中使用的 lag
函数是dplyr
的lag
(即使直到BlondedDust评论才意识到),这是一个 true 和真实的纯碱R解决方案(希望如此):
Edit
the lag
function I used in my solution is the lag
of dplyr
(even though I didn't realized it until the BlondedDust comment) and here is a true and real pure base R solution (I hope):
dsp <- split(df, list(df$state, df$county) )
dsp <- lapply(dsp, function(x) transform(x, unemp_lag = c(NA, unemp[1:length(unemp)-1]) ) )
dsp <- unsplit(dsp, list(df$state, df$county))
dsp
yearmonth state county unemp unemp_lag
1 2005-01 1 3 4.0 NA
2 2005-02 1 3 3.6 4.0
3 2005-03 1 3 1.4 3.6
4 2005-01 2 3 3.7 NA
5 2005-02 2 3 6.5 3.7
6 2005-03 2 3 5.4 6.5
这篇关于R在特定子集中的滞后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!