根据R数据框中的发生日期分配事件编号 [英] Assign event number based on Date of occurece in R dataframe

查看:70
本文介绍了根据R数据框中的发生日期分配事件编号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何根据满足以下条件的事件发生日期来分配事件编号.

How to assign an event number based on their date of occurrence satisfying the following conditions.

  1. 如果事件连续发生至少3天(或更多),请分配事件编号 e1 ,依此类推,并与原始数据框进行变异(联接).
  2. 如果该事件不是连续3天,请分配 NA 并使用原始数据框进行变异.在时间序列 dts 中,我该如何实现.输出数据帧将类似于 dts_output (手动完成).
  1. If the event occurs for at least 3 consecutive days ( or more ) assign event number e1 and so on and mutate (join) with the original data frame.
  2. If the occurrence is not for continuous 3 days, assign NA and mutate with the original data frame. In time series dts how can I achieve it. The output data frame would be like dts_output (done manually).


    dts<-structure(list(Date = structure(c(16442, 16443, 16444, 16445, 
     16484, 16485, 16486, 16487, 16488, 16489, 16490, 16491, 16492, 
    16493, 16499, 16500, 16511, 16512, 16513), class = "Date"), cct = c(11, 
     11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
     11, 11)), row.names = c(NA, -19L), class = c("tbl_df", "tbl", 
     "data.frame"))

    dts

#Expected output

    dts_output<-structure(list(Date = structure(c(16442, 16443, 16444, 16445, 
           16484, 16485, 16486, 16487, 16488, 16489, 16490, 16491, 16492, 
           16493, 16499, 16500, 16511, 16512, 16513), class = "Date"), cct = c(11, 
           11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
           11, 11), event = c("e1", "e1", "e1", "e1", "e2", "e2", "e2", 
           "e2", "e2", "e2", "e2", "e2", "e2", "e2", NA, NA, "e3", "e3", 
           "e3")), row.names = c(NA, -19L), spec = structure(list(cols = list(
           Date = structure(list(), class = c("collector_character", 
           "collector")), cct = structure(list(), class = c("collector_double", 
           "collector")), event = structure(list(), class = c("collector_character", 
           "collector"))), default = structure(list(), class = c("collector_guess", 
           "collector")), skip = 1L), class = "col_spec"), class = c("spec_tbl_df", 
           "tbl_df", "tbl", "data.frame"))
    dts_output

推荐答案

也许路很长,但可以完成任务:

Maybe a long path but can do the task:

library(dplyr)
library(tidyr)
#Code
dts$Var <- c(0,diff(dts$Date))
i <- which(dts$Var!=1)
dts$Var <- ifelse(dts$Var==1,NA,dts$Var)
dts$Var[i] <- 1:length(i)
#Fill
input1 <- dts %>% fill(Var) %>%
  group_by(Var) %>%
  mutate(Var2=ifelse(n()>=3,cur_group_id(),NA))
#Extract unique
add <- data.frame(Var2=unique(na.omit(input1$Var2)),stringsAsFactors = F)
add$Group <- paste0('e',1:nrow(add))
#Merge
input2 <- input1 %>% left_join(add) %>%
  select(-c(Var,Var2))

输出:

# A tibble: 19 x 4
# Groups:   Var [4]
     Var Date         cct Group
   <dbl> <date>     <dbl> <chr>
 1     1 2015-01-07    11 e1   
 2     1 2015-01-08    11 e1   
 3     1 2015-01-09    11 e1   
 4     1 2015-01-10    11 e1   
 5     2 2015-02-18    11 e2   
 6     2 2015-02-19    11 e2   
 7     2 2015-02-20    11 e2   
 8     2 2015-02-21    11 e2   
 9     2 2015-02-22    11 e2   
10     2 2015-02-23    11 e2   
11     2 2015-02-24    11 e2   
12     2 2015-02-25    11 e2   
13     2 2015-02-26    11 e2   
14     2 2015-02-27    11 e2   
15     3 2015-03-05    11 NA   
16     3 2015-03-06    11 NA   
17     4 2015-03-17    11 e3   
18     4 2015-03-18    11 e3   
19     4 2015-03-19    11 e3 

这篇关于根据R数据框中的发生日期分配事件编号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆