填充计数/总和基于时间序列中的前一行计数 [英] Fill count/sum based on previous row count over time series

查看:92
本文介绍了填充计数/总和基于时间序列中的前一行计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经对每个组(第2组)在一段时间内进行了事件计数(第1组).我希望将第1组事件分散到单独的列中,并使用第2组和时间戳作为行.每个单元格将包含一个时间段内的事件计数(当前日期为前4天).

I have performed counts of events (in Group 1) over a time period for each group (in Group 2). I am looking to spread Group 1 events into separate columns, and using Group 2 and timestamp as rows. Each cell will contain the counts of events over a time period (Present date to the previous 4 days).

请参见下面的示例,对于第2组(I和II),我计算了第1组中的事件A和L在4天内发生了.

See the example below, for each of the Group 2 (I & II) I counted Events A and L in Group 1 happened within 4 days.

dates = as.Date(c("2011-10-09",
   "2011-10-15",
   "2011-10-16", 
   "2011-10-18", 
   "2011-10-21", 
   "2011-10-22", 
   "2011-10-24")) 
group1=c("A",
    "A",
    "A", 
    "A", 
    "L", 
    "L", 
    "A")
group2=c("I",
    "I",
    "I", 
    "I", 
    "I", 
    "I", 
    "II")

df1 <- data.frame(dates, group1, group2) 

使用dplyr管道,我设法产生了下表(另请参见

Using dplyr pipes I managed to produce the following table (also see Count event types over time series by multiple conditions)

df1 %>%
  group_by(group1, group2) %>%
  mutate(count = sapply(dates
                    , function(x){
                      sum(dates <= x & dates > (x-4))
                      }))


   dates group1 group2 count
  <date> <fctr> <fctr> <int>
1 2011-10-09      A      I     1
2 2011-10-15      A      I     1
3 2011-10-16      A      I     2
4 2011-10-18      A      I     3
5 2011-10-21      L      I     1
6 2011-10-22      L      I     2
7 2011-10-24      A     II     1

最终,我想使用事件A&获得一个与此类似的表. L计数均根据I& I中的日期(时间段=当前日期-4天)更新. II(第2组).

Eventually, I want to obtain a table similar to this, with Events A & L counts update according to dates (time period = current date - 4 days) in both I & II (Group 2).

         dates  group1 group2  count (A)   count (L)
     1 2011-10-09      A      I        1         0
     2 2011-10-15      A      I        1         0
     3 2011-10-16      A      I        2         0
     4 2011-10-18      A      I        3         0
     5 2011-10-21      L      I        0         1
     6 2011-10-22      L      I        0         2
     7 2011-10-24      A      II       1         0

在更大的数据集中,并非组1中的所有事件都出现在每个组2中. 我该如何更新这些空单元格,以使其要么1)从上一行继续计数,要么2)根据更新的时间戳记/时间段更新计数?

In a larger dataset, not all events in Group 1 appears in every Group 2. How can I update these empty cells so that it will either 1) carry forward the count from the previous row or 2) update the count based on the updated timestamp/ time period?

谢谢!

推荐答案

虽然仍然不清楚您想要什么(请参阅问题注释),但这是两种可能的方法.

While it is still a bit unclear what you want (see comments on the question), here are two potential approaches.

如果您只想散布count列(出于某种原因)并用0填充(是否在过去4天中发生过事件),并且仍按group2细分进行计数(即使您仅用group1进行标记)并将事件详细信息保留在适当的位置(例如您在问题中的示例),也可以只创建带有所需标签的列,然后使用spread创建新列.这个

If all you want to do is spread the count column out (for some reason) and fill it with 0's (whether there was an event in the preceding 4 days or not) and still count by the group2 breakdown (even though you are only labeling by group1) and leave the event details in place (like your example in your question), you can just create a column with the labels you want, then use spread to create the new columns. This

df1 %>%
  group_by(group1, group2) %>%
  mutate(count = sapply(dates
                        , function(x){
                          sum(dates <= x & dates > (x-4))
                        })) %>%
  ungroup() %>%
  mutate(toSpread = paste0("Count (", group1, ")")) %>%
  spread(toSpread, count, fill = 0)

返回此:

       dates group1 group2 `Count (A)` `Count (L)`
*     <date> <fctr> <fctr>       <dbl>       <dbl>
1 2011-10-09      A      I           1           0
2 2011-10-15      A      I           1           0
3 2011-10-16      A      I           2           0
4 2011-10-18      A      I           3           0
5 2011-10-21      L      I           0           1
6 2011-10-22      L      I           0           2
7 2011-10-24      A     II           1           0

与您在问题中显示的输出匹配.但是,如果要在任何一天计数一次,则要了解每个group1发生了多少事件,那么您将需要进一步退后一步.为此,您需要生成一个具有所需日期的新数据框-每个组一行.使用tidyr中的complete可以很容易地做到这一点.然后,您可以检查每个组中是否有该组在前四天发生的事件.

Which matches the output you have shown in your question. However, if what you want is a count on any day with an event of how many of each group1's events have occurred, you will need to step back a bit further. For that, you need to generate a new data frame with the dates you want -- with a row for each group. This is easy to get using complete from tidyr. Then, you can check each of those for events that occurred in the preceding four days for that group.

df1 %>%
  select(dates, group1) %>%
  complete(dates, group1) %>%
  mutate(count = sapply(1:n()
                        , function(idx){
                          sum(df1$dates <= dates[idx] &
                                df1$dates > (dates[idx]-4) &
                                df1$group1 == group1[idx])
                        })) %>%
  mutate(group1 = paste0("Count (", group1, ")")) %>%
  spread(group1, count, fill = 0)

返回:

# A tibble: 7 x 3
       dates `Count (A)` `Count (L)`
*     <date>       <dbl>       <dbl>
1 2011-10-09           1           0
2 2011-10-15           1           0
3 2011-10-16           2           0
4 2011-10-18           3           0
5 2011-10-21           1           1
6 2011-10-22           0           2
7 2011-10-24           1           2

请注意,如果要包括没有事件的日期,可以通过将要检查的日期传递到complete中来实现.例如:

Note that, if you want to include days for which there were no events, you can do that by passing the dates you want checked into complete. For example:

df1 %>%
  select(dates, group1) %>%
  complete(dates = full_seq(dates, 1), group1) %>%
  mutate(count = sapply(1:n()
                        , function(idx){
                          sum(df1$dates <= dates[idx] &
                                df1$dates > (dates[idx]-4) &
                                df1$group1 == group1[idx])
                        })) %>%
  mutate(group1 = paste0("Count (", group1, ")")) %>%
  spread(group1, count, fill = 0)

返回:

        dates `Count (A)` `Count (L)`
 *     <date>       <dbl>       <dbl>
 1 2011-10-09           1           0
 2 2011-10-10           1           0
 3 2011-10-11           1           0
 4 2011-10-12           1           0
 5 2011-10-13           0           0
 6 2011-10-14           0           0
 7 2011-10-15           1           0
 8 2011-10-16           2           0
 9 2011-10-17           2           0
10 2011-10-18           3           0
11 2011-10-19           2           0
12 2011-10-20           1           0
13 2011-10-21           1           1
14 2011-10-22           0           2
15 2011-10-23           0           2
16 2011-10-24           1           2

基于这些评论,我认为我终于了解了目标.首先,如上所述,我首先创建一个长"数据框,其中包含每个日期的每个group1/group2对的计数:

Based on the comments, I think I am finally understanding the goal. First, I would start out by, as above, creating a "long" data frame with the counts for each group1/group2 pair for every date:

fullDateCounts <-
  df1 %>%
  select(dates, group1, group2) %>%
  complete(dates = full_seq(dates, 1), group1, group2) %>%
  mutate(count = sapply(1:n()
                        , function(idx){
                          sum(df1$dates <= dates[idx] &
                                df1$dates > (dates[idx]-4) &
                                df1$group1 == group1[idx] &
                                df1$group2 == group2[idx]
                              )
                        }))

最上面的是:

        dates group1 group2 count
       <date> <fctr> <fctr> <int>
 1 2011-10-09      A      I     1
 2 2011-10-09      A     II     0
 3 2011-10-09      L      I     0
 4 2011-10-09      L     II     0
 5 2011-10-10      A      I     1
 6 2011-10-10      A     II     0
 7 2011-10-10      L      I     0
 8 2011-10-10      L     II     0
 9 2011-10-11      A      I     1
10 2011-10-11      A     II     0
# ... with 54 more rows

从那里开始,如果您真的需要转换为宽格式,则可以为每个group2(或group1,如果您切换列名称)对一行进行操作:

From there, if you really need to convert to a wide form, you can either do so with a row for each group2 (or group1, if you switch the column names):

fullDateCounts %>%
  mutate(group1 = paste0("Count (", group1, ")")) %>%
  spread(group1, count, fill = 0)

返回:

        dates group2 `Count (A)` `Count (L)`
 *     <date> <fctr>       <dbl>       <dbl>
 1 2011-10-09      I           1           0
 2 2011-10-09     II           0           0
 3 2011-10-10      I           1           0
 4 2011-10-10     II           0           0
 5 2011-10-11      I           1           0
 6 2011-10-11     II           0           0
 7 2011-10-12      I           1           0
 8 2011-10-12     II           0           0
 9 2011-10-13      I           0           0
10 2011-10-13     II           0           0
# ... with 22 more rows

或者,您可以为每个group1/group2对生成一列:

Or, you can generate a column for each group1/group2 pair:

fullDateCounts %>%
  mutate(toSpread = paste0("Count (", group1, "-", group2, ")")) %>%
  select(-group1, -group2) %>%
  spread(toSpread, count, fill = 0)

返回

        dates `Count (A-I)` `Count (A-II)` `Count (L-I)` `Count (L-II)`
 *     <date>         <dbl>          <dbl>         <dbl>          <dbl>
 1 2011-10-09             1              0             0              0
 2 2011-10-10             1              0             0              0
 3 2011-10-11             1              0             0              0
 4 2011-10-12             1              0             0              0
 5 2011-10-13             0              0             0              0
 6 2011-10-14             0              0             0              0
 7 2011-10-15             1              0             0              0
 8 2011-10-16             2              0             0              0
 9 2011-10-17             2              0             0              0
10 2011-10-18             3              0             0              0
11 2011-10-19             2              0             0              0
12 2011-10-20             1              0             0              0
13 2011-10-21             1              0             1              0
14 2011-10-22             0              0             2              0
15 2011-10-23             0              0             2              0
16 2011-10-24             0              1             2              0

这篇关于填充计数/总和基于时间序列中的前一行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆