填充计数/总和基于时间序列中的前一行计数 [英] Fill count/sum based on previous row count over time series
问题描述
我已经对每个组(第2组)在一段时间内进行了事件计数(第1组).我希望将第1组事件分散到单独的列中,并使用第2组和时间戳作为行.每个单元格将包含一个时间段内的事件计数(当前日期为前4天).
I have performed counts of events (in Group 1) over a time period for each group (in Group 2). I am looking to spread Group 1 events into separate columns, and using Group 2 and timestamp as rows. Each cell will contain the counts of events over a time period (Present date to the previous 4 days).
请参见下面的示例,对于第2组(I和II),我计算了第1组中的事件A和L在4天内发生了.
See the example below, for each of the Group 2 (I & II) I counted Events A and L in Group 1 happened within 4 days.
dates = as.Date(c("2011-10-09",
"2011-10-15",
"2011-10-16",
"2011-10-18",
"2011-10-21",
"2011-10-22",
"2011-10-24"))
group1=c("A",
"A",
"A",
"A",
"L",
"L",
"A")
group2=c("I",
"I",
"I",
"I",
"I",
"I",
"II")
df1 <- data.frame(dates, group1, group2)
Using dplyr pipes I managed to produce the following table (also see Count event types over time series by multiple conditions)
df1 %>%
group_by(group1, group2) %>%
mutate(count = sapply(dates
, function(x){
sum(dates <= x & dates > (x-4))
}))
dates group1 group2 count
<date> <fctr> <fctr> <int>
1 2011-10-09 A I 1
2 2011-10-15 A I 1
3 2011-10-16 A I 2
4 2011-10-18 A I 3
5 2011-10-21 L I 1
6 2011-10-22 L I 2
7 2011-10-24 A II 1
最终,我想使用事件A&获得一个与此类似的表. L计数均根据I& I中的日期(时间段=当前日期-4天)更新. II(第2组).
Eventually, I want to obtain a table similar to this, with Events A & L counts update according to dates (time period = current date - 4 days) in both I & II (Group 2).
dates group1 group2 count (A) count (L)
1 2011-10-09 A I 1 0
2 2011-10-15 A I 1 0
3 2011-10-16 A I 2 0
4 2011-10-18 A I 3 0
5 2011-10-21 L I 0 1
6 2011-10-22 L I 0 2
7 2011-10-24 A II 1 0
在更大的数据集中,并非组1中的所有事件都出现在每个组2中. 我该如何更新这些空单元格,以使其要么1)从上一行继续计数,要么2)根据更新的时间戳记/时间段更新计数?
In a larger dataset, not all events in Group 1 appears in every Group 2. How can I update these empty cells so that it will either 1) carry forward the count from the previous row or 2) update the count based on the updated timestamp/ time period?
谢谢!
推荐答案
虽然仍然不清楚您想要什么(请参阅问题注释),但这是两种可能的方法.
While it is still a bit unclear what you want (see comments on the question), here are two potential approaches.
如果您只想散布count
列(出于某种原因)并用0填充(是否在过去4天中发生过事件),并且仍按group2
细分进行计数(即使您仅用group1
进行标记)并将事件详细信息保留在适当的位置(例如您在问题中的示例),也可以只创建带有所需标签的列,然后使用spread
创建新列.这个
If all you want to do is spread the count
column out (for some reason) and fill it with 0's (whether there was an event in the preceding 4 days or not) and still count by the group2
breakdown (even though you are only labeling by group1
) and leave the event details in place (like your example in your question), you can just create a column with the labels you want, then use spread
to create the new columns. This
df1 %>%
group_by(group1, group2) %>%
mutate(count = sapply(dates
, function(x){
sum(dates <= x & dates > (x-4))
})) %>%
ungroup() %>%
mutate(toSpread = paste0("Count (", group1, ")")) %>%
spread(toSpread, count, fill = 0)
返回此:
dates group1 group2 `Count (A)` `Count (L)`
* <date> <fctr> <fctr> <dbl> <dbl>
1 2011-10-09 A I 1 0
2 2011-10-15 A I 1 0
3 2011-10-16 A I 2 0
4 2011-10-18 A I 3 0
5 2011-10-21 L I 0 1
6 2011-10-22 L I 0 2
7 2011-10-24 A II 1 0
与您在问题中显示的输出匹配.但是,如果要在任何一天计数一次,则要了解每个group1发生了多少事件,那么您将需要进一步退后一步.为此,您需要生成一个具有所需日期的新数据框-每个组一行.使用tidyr
中的complete
可以很容易地做到这一点.然后,您可以检查每个组中是否有该组在前四天发生的事件.
Which matches the output you have shown in your question. However, if what you want is a count on any day with an event of how many of each group1's events have occurred, you will need to step back a bit further. For that, you need to generate a new data frame with the dates you want -- with a row for each group. This is easy to get using complete
from tidyr
. Then, you can check each of those for events that occurred in the preceding four days for that group.
df1 %>%
select(dates, group1) %>%
complete(dates, group1) %>%
mutate(count = sapply(1:n()
, function(idx){
sum(df1$dates <= dates[idx] &
df1$dates > (dates[idx]-4) &
df1$group1 == group1[idx])
})) %>%
mutate(group1 = paste0("Count (", group1, ")")) %>%
spread(group1, count, fill = 0)
返回:
# A tibble: 7 x 3
dates `Count (A)` `Count (L)`
* <date> <dbl> <dbl>
1 2011-10-09 1 0
2 2011-10-15 1 0
3 2011-10-16 2 0
4 2011-10-18 3 0
5 2011-10-21 1 1
6 2011-10-22 0 2
7 2011-10-24 1 2
请注意,如果要包括没有事件的日期,可以通过将要检查的日期传递到complete
中来实现.例如:
Note that, if you want to include days for which there were no events, you can do that by passing the dates you want checked into complete
. For example:
df1 %>%
select(dates, group1) %>%
complete(dates = full_seq(dates, 1), group1) %>%
mutate(count = sapply(1:n()
, function(idx){
sum(df1$dates <= dates[idx] &
df1$dates > (dates[idx]-4) &
df1$group1 == group1[idx])
})) %>%
mutate(group1 = paste0("Count (", group1, ")")) %>%
spread(group1, count, fill = 0)
返回:
dates `Count (A)` `Count (L)`
* <date> <dbl> <dbl>
1 2011-10-09 1 0
2 2011-10-10 1 0
3 2011-10-11 1 0
4 2011-10-12 1 0
5 2011-10-13 0 0
6 2011-10-14 0 0
7 2011-10-15 1 0
8 2011-10-16 2 0
9 2011-10-17 2 0
10 2011-10-18 3 0
11 2011-10-19 2 0
12 2011-10-20 1 0
13 2011-10-21 1 1
14 2011-10-22 0 2
15 2011-10-23 0 2
16 2011-10-24 1 2
基于这些评论,我认为我终于了解了目标.首先,如上所述,我首先创建一个长"数据框,其中包含每个日期的每个group1/group2对的计数:
Based on the comments, I think I am finally understanding the goal. First, I would start out by, as above, creating a "long" data frame with the counts for each group1/group2 pair for every date:
fullDateCounts <-
df1 %>%
select(dates, group1, group2) %>%
complete(dates = full_seq(dates, 1), group1, group2) %>%
mutate(count = sapply(1:n()
, function(idx){
sum(df1$dates <= dates[idx] &
df1$dates > (dates[idx]-4) &
df1$group1 == group1[idx] &
df1$group2 == group2[idx]
)
}))
最上面的是:
dates group1 group2 count
<date> <fctr> <fctr> <int>
1 2011-10-09 A I 1
2 2011-10-09 A II 0
3 2011-10-09 L I 0
4 2011-10-09 L II 0
5 2011-10-10 A I 1
6 2011-10-10 A II 0
7 2011-10-10 L I 0
8 2011-10-10 L II 0
9 2011-10-11 A I 1
10 2011-10-11 A II 0
# ... with 54 more rows
从那里开始,如果您真的需要转换为宽格式,则可以为每个group2(或group1,如果您切换列名称)对一行进行操作:
From there, if you really need to convert to a wide form, you can either do so with a row for each group2 (or group1, if you switch the column names):
fullDateCounts %>%
mutate(group1 = paste0("Count (", group1, ")")) %>%
spread(group1, count, fill = 0)
返回:
dates group2 `Count (A)` `Count (L)`
* <date> <fctr> <dbl> <dbl>
1 2011-10-09 I 1 0
2 2011-10-09 II 0 0
3 2011-10-10 I 1 0
4 2011-10-10 II 0 0
5 2011-10-11 I 1 0
6 2011-10-11 II 0 0
7 2011-10-12 I 1 0
8 2011-10-12 II 0 0
9 2011-10-13 I 0 0
10 2011-10-13 II 0 0
# ... with 22 more rows
或者,您可以为每个group1/group2对生成一列:
Or, you can generate a column for each group1/group2 pair:
fullDateCounts %>%
mutate(toSpread = paste0("Count (", group1, "-", group2, ")")) %>%
select(-group1, -group2) %>%
spread(toSpread, count, fill = 0)
返回
dates `Count (A-I)` `Count (A-II)` `Count (L-I)` `Count (L-II)`
* <date> <dbl> <dbl> <dbl> <dbl>
1 2011-10-09 1 0 0 0
2 2011-10-10 1 0 0 0
3 2011-10-11 1 0 0 0
4 2011-10-12 1 0 0 0
5 2011-10-13 0 0 0 0
6 2011-10-14 0 0 0 0
7 2011-10-15 1 0 0 0
8 2011-10-16 2 0 0 0
9 2011-10-17 2 0 0 0
10 2011-10-18 3 0 0 0
11 2011-10-19 2 0 0 0
12 2011-10-20 1 0 0 0
13 2011-10-21 1 0 1 0
14 2011-10-22 0 0 2 0
15 2011-10-23 0 0 2 0
16 2011-10-24 0 1 2 0
这篇关于填充计数/总和基于时间序列中的前一行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!