R:按日期统计历史记录 [英] R: Aggregating History By ID By Date

查看:121
本文介绍了R:按日期统计历史记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的数据集,具有个人和日期的唯一ID,每个人都有多次遇到。



以下是代码以及这些数据的外观示例:

  strDates<  -  c(09/09/16,6/7/16,5/6/16,2/3/16,2/1/16, 11/8/16,
6/8/16,5/8/16,2/3/16,1/1/16)
日期< - as.Date(strDates,%m /%d /%y)
ID <-C(A,A,A,A,A,B ,B,B,B,B)
事件< - c(1,0,1,0,1,0,1,1,1,0)
sample_df< - data.frame(Date,ID,Event)

sample_df

日期ID事件
1 2016-09-09 A 1
2 2016-06-07 A 0
3 14,12-06 A 1
4 2016-02-03 A 0
5 2016-02-01 A 1
6 2016-11-08 B 0
7 2016-06-08 B 1
8_n_-08 B 1
9 2016-02-03 B 1
10 2016- 01-01 B 0

我想保留每次遇到的所有附加信息,但会聚合以下历史信息按照


  1. 以前遇到的人数

  2. vious事件

举个例子,看看第2行。



行2是ID A,所以我将引用3-5行(发生在第2行遇到之前)。在这组行中,我们看到第3行和第5都有事件。



行2 = 3之前遇到的次数



行2的先前事件数= 2

理想情况下,我会得到以下输出:

 日期ID事件PrevEnc PrevEvent 
1 2016-09-09 A 1 4 2
2 2016-06-07 A 0 3 2
3✓-06A 1 2 1
4 2016- 02-03 A 0 1 1
5 2016-02-01 A 1 0 0
6 2016-11-08 B 0 4 3
7 2016-06-08 B 1 3 2
8_n_-08 B 1 2 1
9 2016-02-03 B 1 1 0
10 2016-01-01 B 0 0 0

到目前为止,我已经尝试使用mutate来处理这个问题,并且总结了这两个都没有让我成功地限制我的聚合以前针对特定ID发生的事件。我已经尝试了一些凌乱的For-循环与If-then语句,但是真的只是想知道是否存在一个包或技术来简化此过程。



谢谢!

解决方案

最大的障碍是当前的排序顺序。在这里,我存储了一个原始索引点,我以后用来重新排序数据(然后删除它)。除此之外,基本思想是从0开始计数,并使用 cumsum 来计算事件的发生。为此,滞后用于避免计算当前事件。

  sample_df%>%
mutate(origIndex = 1:n())%>%
group_by(ID)%>%
arrange(ID,Date)%>%
mutate(PrevEncounters = 0:(n()-1)
,PrevEvents = cumsum(lag(Event,default = 0)))%>%
arrange(origIndex)%> ;%
选择(-origIndex)

 日期ID事件PrevEncounters PrevEvents 
< date> < FCTR> < DBL> < INT> < DBL>
1 2016-09-09 A 1 4 2
2 2016-06-07 A 0 3 2
3✓-06A 1 2 1
4 2016-02 -03 A 0 1 1
5 2016-02-01 A 1 0 0
6 2016-11-08 B 0 4 3
7 2016-06-08 B 1 3 2
8_n_-08 B 1 2 1
9 2016-02-03 B 1 1 0
10 2016-01-01 B 0 0 0


I have a large data set that has unique IDs for individuals as well as dates, and each individual is capable of multiple encounters.

The below is code and an example of how this data might look:

strDates <- c("09/09/16", "6/7/16", "5/6/16", "2/3/16", "2/1/16", "11/8/16",      
"6/8/16", "5/8/16","2/3/16","1/1/16")
Date<-as.Date(strDates, "%m/%d/%y")
ID <- c("A", "A", "A", "A","A","B","B","B","B","B")
Event <- c(1,0,1,0,1,0,1,1,1,0)
sample_df <- data.frame(Date,ID,Event)

sample_df

         Date ID Event
1  2016-09-09  A     1
2  2016-06-07  A     0
3  2016-05-06  A     1
4  2016-02-03  A     0
5  2016-02-01  A     1
6  2016-11-08  B     0
7  2016-06-08  B     1
8  2016-05-08  B     1
9  2016-02-03  B     1
10 2016-01-01  B     0

I want to keep all attached information per encounter, but then aggregate the following historical information by id

  1. Number of Previous Encounters
  2. Number of Previous Events

As an example, let's look at Row 2.

Row 2 is ID A, so I would reference Rows 3-5 (which occurred prior to Row 2 Encounter). Within this group of rows, we see that Row 3 & 5 both had events.

Number of Previous Encounters for Row 2 = 3

Number of Previous Events for Row 2 = 2

Ideally, I would get the following output:

         Date ID Event PrevEnc PrevEvent
1  2016-09-09  A     1       4         2
2  2016-06-07  A     0       3         2
3  2016-05-06  A     1       2         1
4  2016-02-03  A     0       1         1
5  2016-02-01  A     1       0         0
6  2016-11-08  B     0       4         3
7  2016-06-08  B     1       3         2
8  2016-05-08  B     1       2         1
9  2016-02-03  B     1       1         0
10 2016-01-01  B     0       0         0

So far, I have tried working this problem in dplyr with mutate as well as summarise, both of which have not let me successfully restrict my aggregation to events that occurred previously for a specific ID. I have tried some messy For-loops with If-then statements, but really just wondering if a package or technique exists to simplify this process.

Thank you!

解决方案

The biggest impediment is the current sort order. Here, I stored an original index point, which I later used to re-sort the data (then removed it). Other than that, the basic idea is to count up from 0 for the encounters, and to use cumsum to count the events as they happen. To that end, lag is used to avoid counting the current event.

sample_df %>%
  mutate(origIndex = 1:n()) %>%
  group_by(ID) %>%
  arrange(ID, Date) %>%
  mutate(PrevEncounters = 0:(n() -1)
         , PrevEvents = cumsum(lag(Event, default = 0))) %>%
  arrange(origIndex) %>%
  select(-origIndex)

Gives

         Date     ID Event PrevEncounters PrevEvents
       <date> <fctr> <dbl>          <int>      <dbl>
1  2016-09-09      A     1              4          2
2  2016-06-07      A     0              3          2
3  2016-05-06      A     1              2          1
4  2016-02-03      A     0              1          1
5  2016-02-01      A     1              0          0
6  2016-11-08      B     0              4          3
7  2016-06-08      B     1              3          2
8  2016-05-08      B     1              2          1
9  2016-02-03      B     1              1          0
10 2016-01-01      B     0              0          0

这篇关于R:按日期统计历史记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆