在R中按组连续记录 [英] Record Consecutive Days by Group in R
问题描述
我有一个数据框如下:
DATE <- as.Date(c('2016-12-01', '2016-12-02', '2016-12-03', '2016-12-04', '2016-12-01', '2016-12-03', '2016-12-04', '2016-12-04' ))
Parent <- c('A','A','A','A','A','A','A','B')
Child <- c('ab', 'ab', 'ab', 'ab', 'ac','ac', 'ac','bd')
salary <- c(1000, 100, 4000, 2000,1000,3455,1234,600)
avg_child_salary <- c(500, 500, 500, 500, 300, 300, 300, 9000)
Callout <- c('HIGH', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'LOW')
employ.data <- data.frame(DATE, Parent, Child, avg_child_salary, salary, Callout)
employ.data
DATE Parent Child avg_child_salary salary Callout
1 2016-12-01 A ab 500 1000 HIGH
2 2016-12-02 A ab 500 100 LOW
3 2016-12-03 A ab 500 4000 HIGH
4 2016-12-04 A ab 500 2000 HIGH
5 2016-12-01 A ac 300 1000 HIGH
6 2016-12-03 A ac 300 3455 HIGH
7 2016-12-04 A ac 300 1234 HIGH
8 2016-12-04 B bd 9000 600 LOW
我已经滤除了昨天的数据是 2016-12-04
如下:
I have filtered out just yesterday's data being 2016-12-04
as follows:
yesterday <- as.Date(Sys.Date()-1)
df2<-filter(employ.data, DATE == yesterday)
df2
DATE Parent Child avg_child_salary salary Callout
4 2016-12-04 A ab 500 2000 HIGH
7 2016-12-04 A ac 300 1234 HIGH
8 2016-12-04 B bd 9000 600 LOW
我的目标是在 Callout
显示连续天数f rom 2016-12-04
标注已经 HIGH
或 LOW $ c $基于
employ.data
数据框,c> by Child
这是我需要的最终输出:
My goal is to include a column next to Callout
showing the amount of consecutive days from 2016-12-04
the callout has been HIGH
or LOW
by Child
based on the employ.data
dataframe. This is what I need as the final output:
DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
4 2016-12-04 A ab 500 2000 HIGH 2
7 2016-12-04 A ac 300 1234 HIGH 2
8 2016-12-04 B bd 9000 600 LOW 1
谢谢!
推荐答案
这是另一种方法是相当凌乱,但我认为你想要的:
Here is another approach that is quite messy but I think does what you want:
library(dplyr)
yesterday <- as.Date(Sys.Date()-1)
df2 <- employ.data %>% group_by(Child) %>%
mutate(`Consec. Days with Callout`=cumsum(rev(cumprod(rev((yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]))))) %>%
filter(DATE == yesterday)
##Source: local data frame [3 x 7]
##Groups: Child [3]
##
## DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
## <date> <fctr> <fctr> <dbl> <dbl> <fctr> <dbl>
##1 2016-12-04 A ab 500 2000 HIGH 2
##2 2016-12-04 A ac 300 1234 HIGH 2
##3 2016-12-04 B bd 9000 600 LOW 1
注意:
-
(yesterday-DATE)==(which(DATE == yesterday)-row_number())& Callout == Callout [DATE == yesterday]
计算一个条件,如果$ code> Callout <= code> TRUE / code>与昨天
的标注
相同,如果行中的行距离是昨天
与日期的距离相同。这给出了Cond
列,如下所示:
(yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]
computes a condition that will beTRUE
for the row if theCallout
is the same as theCallout
foryesterday
and if the distance in rows from the row that isyesterday
is the same as the distance in days for the date. This gives theCond
column as shown below:
Source: local data frame [8 x 7]
Groups: Child [3]
DATE Parent Child avg_child_salary salary Callout Cond
<date> <fctr> <fctr> <dbl> <dbl> <fctr> <lgl>
1 2016-12-01 A ab 500 1000 HIGH TRUE
2 2016-12-02 A ab 500 100 LOW FALSE
3 2016-12-03 A ab 500 4000 HIGH TRUE
4 2016-12-04 A ab 500 2000 HIGH TRUE
5 2016-12-01 A ac 300 1000 HIGH FALSE
6 2016-12-03 A ac 300 3455 HIGH TRUE
7 2016-12-04 A ac 300 1234 HIGH TRUE
8 2016-12-04 B bd 9000 600 LOW TRUE
鉴于此,我们希望从昨天行的连续
TRUE
的数量倒数c $ c>(按 Child
分组)。为了做到这一点,我们可以使用 rev
来逆转向量,执行一个 cumprod
,它将从 1
到 0
一旦遇到 FALSE
,反向向量再次使用 rev
,最后执行 cumsum
来累积连续的日子。这样做会给出以下的 Consec。标注日期
列解释为以前连续天数与标注
之间的日期昨天
:
Given this we want to count backwards the number of consecutive TRUE
from the row that is yesterday
(grouped by Child
). To do this, we can reverse the vector using rev
, do a cumprod
, which will switch from 1
to 0
as soon as it encounters a FALSE
, reverse the vector back again using rev
, and finally do the cumsum
to accumulate the consecutive days. Doing this gives the following where the Consec. Days with Callout
column is interpreted as the number of previous consecutive days with the same Callout
as yesterday
:
Source: local data frame [8 x 7]
Groups: Child [3]
DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
<date> <fctr> <fctr> <dbl> <dbl> <fctr> <dbl>
1 2016-12-01 A ab 500 1000 HIGH 0
2 2016-12-02 A ab 500 100 LOW 0
3 2016-12-03 A ab 500 4000 HIGH 1
4 2016-12-04 A ab 500 2000 HIGH 2
5 2016-12-01 A ac 300 1000 HIGH 0
6 2016-12-03 A ac 300 3455 HIGH 1
7 2016-12-04 A ac 300 1234 HIGH 2
8 2016-12-04 B bd 9000 600 LOW 1
最后,执行过滤器
,就像生成最终结果一样。
Finally, do the filter
as you did to generate the final result.
这篇关于在R中按组连续记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!