在R中按组连续记录 [英] Record Consecutive Days by Group in R

查看:102
本文介绍了在R中按组连续记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框如下:

DATE <- as.Date(c('2016-12-01', '2016-12-02', '2016-12-03', '2016-12-04', '2016-12-01', '2016-12-03', '2016-12-04', '2016-12-04' ))
Parent <- c('A','A','A','A','A','A','A','B')
Child <- c('ab', 'ab', 'ab', 'ab', 'ac','ac', 'ac','bd')
salary <- c(1000, 100, 4000, 2000,1000,3455,1234,600)
avg_child_salary <- c(500, 500, 500, 500, 300, 300, 300, 9000)
Callout <- c('HIGH', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'LOW')
employ.data <- data.frame(DATE, Parent, Child, avg_child_salary, salary, Callout)

employ.data

        DATE Parent Child avg_child_salary salary Callout
1 2016-12-01      A    ab              500   1000    HIGH
2 2016-12-02      A    ab              500    100     LOW
3 2016-12-03      A    ab              500   4000    HIGH
4 2016-12-04      A    ab              500   2000    HIGH
5 2016-12-01      A    ac              300   1000    HIGH
6 2016-12-03      A    ac              300   3455    HIGH
7 2016-12-04      A    ac              300   1234    HIGH
8 2016-12-04      B    bd             9000    600     LOW

我已经滤除了昨天的数据是 2016-12-04 如下:

I have filtered out just yesterday's data being 2016-12-04 as follows:

yesterday <- as.Date(Sys.Date()-1)
df2<-filter(employ.data, DATE == yesterday)
df2

            DATE Parent Child avg_child_salary salary Callout
    4 2016-12-04      A    ab              500   2000    HIGH
    7 2016-12-04      A    ac              300   1234    HIGH
    8 2016-12-04      B    bd             9000    600     LOW

我的目标是在 Callout 显示连续天数f rom 2016-12-04 标注已经 HIGH LOW employ.data 数据框,c> by Child 这是我需要的最终输出:

My goal is to include a column next to Callout showing the amount of consecutive days from 2016-12-04 the callout has been HIGH or LOW by Child based on the employ.data dataframe. This is what I need as the final output:

            DATE Parent Child avg_child_salary salary Callout   Consec. Days with Callout
    4 2016-12-04      A    ab              500   2000    HIGH                           2
    7 2016-12-04      A    ac              300   1234    HIGH                           2
    8 2016-12-04      B    bd             9000    600     LOW                           1

谢谢!

推荐答案

这是另一种方法是相当凌乱,但我认为你想要的:

Here is another approach that is quite messy but I think does what you want:

library(dplyr)
yesterday <- as.Date(Sys.Date()-1)
df2 <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=cumsum(rev(cumprod(rev((yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]))))) %>%
  filter(DATE == yesterday)
##Source: local data frame [3 x 7]
##Groups: Child [3]
##
##        DATE Parent  Child avg_child_salary salary Callout Consec. Days with Callout
##      <date> <fctr> <fctr>            <dbl>  <dbl>  <fctr>                     <dbl>
##1 2016-12-04      A     ab              500   2000    HIGH                         2
##2 2016-12-04      A     ac              300   1234    HIGH                         2
##3 2016-12-04      B     bd             9000    600     LOW                         1

注意:


  1. (yesterday-DATE)==(which(DATE == yesterday)-row_number())& Callout == Callout [DATE == yesterday] 计算一个条件,如果$ code> Callout <= code> TRUE / code>与昨天标注相同,如果行中的行距离是昨天与日期的距离相同。这给出了 Cond 列,如下所示:

  1. (yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday] computes a condition that will be TRUE for the row if the Callout is the same as the Callout for yesterday and if the distance in rows from the row that is yesterday is the same as the distance in days for the date. This gives the Cond column as shown below:

Source: local data frame [8 x 7]
Groups: Child [3]

        DATE Parent  Child avg_child_salary salary Callout  Cond
      <date> <fctr> <fctr>            <dbl>  <dbl>  <fctr> <lgl>
1 2016-12-01      A     ab              500   1000    HIGH  TRUE
2 2016-12-02      A     ab              500    100     LOW FALSE
3 2016-12-03      A     ab              500   4000    HIGH  TRUE
4 2016-12-04      A     ab              500   2000    HIGH  TRUE
5 2016-12-01      A     ac              300   1000    HIGH FALSE
6 2016-12-03      A     ac              300   3455    HIGH  TRUE
7 2016-12-04      A     ac              300   1234    HIGH  TRUE
8 2016-12-04      B     bd             9000    600     LOW  TRUE


  • 鉴于此,我们希望从昨天行的连续 TRUE 的数量倒数c $ c>(按 Child 分组)。为了做到这一点,我们可以使用 rev 来逆转向量,执行一个 cumprod ,它将从 1 0 一旦遇到 FALSE ,反向向量再次使用 rev ,最后执行 cumsum 来累积连续的日子。这样做会给出以下的 Consec。标注日期列解释为以前连续天数与标注之间的日期昨天

  • Given this we want to count backwards the number of consecutive TRUE from the row that is yesterday (grouped by Child). To do this, we can reverse the vector using rev, do a cumprod, which will switch from 1 to 0 as soon as it encounters a FALSE, reverse the vector back again using rev, and finally do the cumsum to accumulate the consecutive days. Doing this gives the following where the Consec. Days with Callout column is interpreted as the number of previous consecutive days with the same Callout as yesterday:

    Source: local data frame [8 x 7]
    Groups: Child [3]
    
            DATE Parent  Child avg_child_salary salary Callout Consec. Days with Callout
          <date> <fctr> <fctr>            <dbl>  <dbl>  <fctr>                     <dbl>
    1 2016-12-01      A     ab              500   1000    HIGH                         0
    2 2016-12-02      A     ab              500    100     LOW                         0
    3 2016-12-03      A     ab              500   4000    HIGH                         1
    4 2016-12-04      A     ab              500   2000    HIGH                         2
    5 2016-12-01      A     ac              300   1000    HIGH                         0
    6 2016-12-03      A     ac              300   3455    HIGH                         1
    7 2016-12-04      A     ac              300   1234    HIGH                         2
    8 2016-12-04      B     bd             9000    600     LOW                         1
    


  • 最后,执行过滤器,就像生成最终结果一样。

  • Finally, do the filter as you did to generate the final result.

    这篇关于在R中按组连续记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆