创建在面板数据组中的条件上重新启动的顺序计数器 [英] Create sequential counter that restarts on a condition within panel data groups
问题描述
df< - data.frame(country = rep c(A,B),每个= 5),year = rep(2000:2004,times = 2),event = c(0,0,1,0,0,1,0,0,1 ,0),stringsAsFactors = FALSE)
我想要做的是创建一个计数器在每个国家的一系列观察结果中键入 df $ event
。当我们开始观察每个国家时,时钟从1开始;每年通过增加1;并且每当 df $ event == 1
重新启动。所需的输出是这样的:
国家年事件时钟
1 A 2000 0 1
2 A 2001 0 2
3 A 2002 1 1
4 A 2003 0 2
5 A 2004 0 3
6 B 2000 1 1
7 B 2001 0 2
8 B 2002 0 3
9 B 2003 1 1
10 B 2004 0 2
我尝试使用 getanID
从 splitstackshape
和的一些变体,如果
和
ifelse
但到目前为止还没有得到所需的结果。
我已经在我需要这样做的脚本中使用 dplyr
,所以我更喜欢使用它或基于R的解决方案,但是我会感谢任何有用的东西。我的数据集不是很大,所以速度并不重要,但是效率总是很高。
c> dplyr 将是:
df%>%
group_by ,idx = cumsum(event == 1L))%>%
mutate(counter = row_number())%>%
ungroup%>%
select(-idx)
#Source:本地数据框架[10 x 4]
#
#国家年度活动柜台
#1 A 2000 0 1
#2 A 2001 0 2
#3 A 2002 1 1
#4 A 2003 0 2
#5 A 2004 0 3
#6 B 2000 1 1
#7 B 2001 0 2
#8 B 2002 0 3
#9 B 2003 1 1
#10 B 2004 0 2
或使用 data.table
:
library(data.table)
setDT(df)[,counter:= seq_len(.N),by = list(country,cumsum(ev ent == 1L))]
编辑: code> group_by(country,idx = cumsum(event == 1L))用于按国家分组和新的分组索引idx。 event == 1L
part创建一个逻辑索引,告诉我们列event是否为整数1( TRUE
/ FALSE
)。然后,$ code> cumsum(...)从前2行开始为0,接下来3为2,接下来3为2,依此类推。我们使用这个新列(+国家/地区)根据需要对数据进行分组。如果您将最后一个删除到dplyr代码中的管道部件,可以查看。
I have a panel data set for which I would like to create a counter that increases with each step in the panel but restarts whenever some condition occurs. In my case, I'm using country-year data and want to count the passage of years between an event. Here's a toy data set with the key features of my real one:
df <- data.frame(country = rep(c("A","B"), each=5), year=rep(2000:2004, times=2), event=c(0,0,1,0,0,1,0,0,1,0), stringsAsFactors=FALSE)
What I'm looking to do is to create a counter that is keyed to df$event
within each country's series of observations. The clock starts at 1 when we start observing each country; it increases by 1 with the passage of each year; and it restarts at 1 whenever df$event==1
. The desired output is this:
country year event clock
1 A 2000 0 1
2 A 2001 0 2
3 A 2002 1 1
4 A 2003 0 2
5 A 2004 0 3
6 B 2000 1 1
7 B 2001 0 2
8 B 2002 0 3
9 B 2003 1 1
10 B 2004 0 2
I have tried using getanID
from splitstackshape
and a few variations of if
and ifelse
but have failed so far to get the desired result.
I'm already using dplyr
in the scripts where I need to do this, so I would prefer a solution that uses it or base R, but I would be grateful for anything that works. My data sets are not massive, so speed is not critical, but efficiency is always a plus.
With dplyr
that would be:
df %>%
group_by(country, idx = cumsum(event == 1L)) %>%
mutate(counter = row_number()) %>%
ungroup %>%
select(-idx)
#Source: local data frame [10 x 4]
#
# country year event counter
#1 A 2000 0 1
#2 A 2001 0 2
#3 A 2002 1 1
#4 A 2003 0 2
#5 A 2004 0 3
#6 B 2000 1 1
#7 B 2001 0 2
#8 B 2002 0 3
#9 B 2003 1 1
#10 B 2004 0 2
Or using data.table
:
library(data.table)
setDT(df)[, counter := seq_len(.N), by = list(country, cumsum(event == 1L))]
Edit: group_by(country, idx = cumsum(event == 1L))
is used to group by country and a new grouping index "idx". The event == 1L
part creates a logical index telling us whether the column "event" is an integer 1 or not (TRUE
/FALSE
). Then, cumsum(...)
sums up starting from 0 for the first 2 rows, 1 for the next 3, 2 for the next 3 and so on. We use this new column (+ country) to group the data as needed. You can check it out if you remove the last to pipe-parts in the dplyr code.
这篇关于创建在面板数据组中的条件上重新启动的顺序计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!