根据R中的多个条件(dplyr,lubridate)创建开始和结束时间列 [英] Create start and endtime columns based on multiple conditions in R (dplyr, lubridate)

查看:41
本文介绍了根据R中的多个条件(dplyr,lubridate)创建开始和结束时间列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集df

 Read      Box       ID      Time
 T         out               10/1/2019 9:00:01 AM
 T         out               10/1/2019 9:00:02 AM
 T         out               10/1/2019 9:00:03 AM
 T         out               10/1/2019 9:02:59 AM
 T         out               10/1/2019 9:03:00 AM
 F                           10/1/2019 9:05:00 AM
 T         out               10/1/2019 9:06:00 AM
 T         out               10/1/2019 9:06:02 AM
 T         in                10/1/2019 9:07:00 AM
 T         in                10/1/2019 9:07:02 AM
 T         out               10/1/2019 9:07:04 AM
 T         out               10/1/2019 9:07:05 AM
 T         out               10/1/2019 9:07:06 AM
           hello             10/1/2019 9:07:08 AM

基于此数据集中的某些条件,我想创建一个startime列和一个endtime列.当发生以下情况时,我想创建一个开始时间":读取=="T",Box =="out"和ID =="当这种情况的第一个实例发生时,将生成一个启动时间.例如,对于此数据集,开始时间将为10/1/2019 9:00:01 AM,因为这是我们看到所需条件首先出现的位置(读取= T,Box =输出且ID =")但是,这些条件中的任何一个都不成立的时刻和结束时间将被创建.因此,第一个结束时间将发生在第6行之前,该时间为10/1/2019 9:03:00 AM.我的最终目标是为此创建一个工期列.

Based on certain conditions within this dataset, I would like to create a startime column and an endtime column. I would like to create a 'starttime' when the following occurs: Read == "T", Box == "out" and ID == "" When the first instance of this condition occurs, a starttime will be generated. For example for this dataset, the starttime will be 10/1/2019 9:00:01 AM since this is where we see the desired conditions occurs first (Read = T, Box = out and ID = "" ) However, the moment when anyone of these conditions is not true, and endtime will be created. So the first endtime would occur right before row 6, where the time is 10/1/2019 9:03:00 AM. My ultimate goal is to then create a duration column for this.

这是我想要的输出:

  starttime                    endtime                     duration

  10/01/2019 9:00:01 AM        10/01/2019 9:03:00 AM       179 secs
  10/1/2019 9:06:00 AM         10/1/2019 9:06:02 AM        2 secs
  10/1/2019 9:07:04 AM         10/1/2019 9:07:06 AM        2 secs

dput:

  structure(list(Read = structure(c(3L, 3L, 3L, 3L, 3L, 2L, 3L, 
  3L, 3L, 3L, 4L, 4L, 3L, 1L), .Label = c("", "F", "T", "T "), class = "factor"), 
  Box = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 2L, 2L, 
  3L, 3L, 3L, 1L), .Label = c("", "in", "out"), class = "factor"), 
  ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  1L, 1L, 1L, 2L), .Label = c("", "hello"), class = "factor"), 
  Time = structure(1:14, .Label = c("10/1/2019 9:00:01 AM", 
 "10/1/2019 9:00:02 AM", "10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM", 
 "10/1/2019 9:03:00 AM", "10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM", 
 "10/1/2019 9:06:02 AM", "10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM", 
 "10/1/2019 9:07:04 AM", "10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM", 
 "10/1/2019 9:07:08 AM"), class = "factor")), class = "data.frame", row.names = c(NA, 
 -14L))

我认为总的来说,我将不得不创建一个循环.我相信我的思维过程正确,只是不确定如何编写代码.这就是我正在尝试的:

I think overall, I would have to create a loop. I believe I have the thought process correct, just unsure of how to formulate the code. This is what I am trying:

 df2 <- mutate(df,
      Date = lubridate::mdy_hms(Date))




   for ( i in 2:nrow(df2))
    {
  if(df2$Read[[i]] == 'T')

     }

我认为这可能是一个开始(只是将我的条件放在循环中,我不确定如何完成此操作)

I think this may be a start (just placing my conditions within the loop, I am not sure how to complete this)

任何建议都值得赞赏.

推荐答案

您可以执行此操作而无需循环.使用 dplyr ,因为使用管道可以轻松地执行多项操作.

You can do this without loop. Using dplyr since it is easy to do multiple things using pipes.

我们首先将 Time 列转换为 POSIXct 类,创建一个 cond 列,该列根据我们要检查的条件给出逻辑值,创建一个列以使用 cond 列的累加总和创建组.仅保留满足条件的行,并获取 Time first last 值,以及每组之间的差异.

We first convert Time column to POSIXct class, create a cond column which gives logical values based on the conditions we want to check, create a column to create groups using cumulative sum of cond column. Keep only the rows which satisfies the condition and get first and last value of Time along with the difference in between them for each group.

library(dplyr)

df %>%
  mutate(Time = lubridate::mdy_hms(Time), 
         cond = Read == "T" & Box == "out" & ID == "", 
         grp = cumsum(!cond)) %>%
  filter(cond) %>%
  group_by(grp) %>%
  summarise(starttime = first(Time), 
            endtime = last(Time), 
            duration = difftime(endtime, starttime, units = "secs")) %>%
  select(-grp)

# A tibble: 3 x 3
#  starttime           endtime             duration
#  <dttm>              <dttm>              <drtn>  
#1 2019-10-01 09:00:01 2019-10-01 09:03:00 179 secs
#2 2019-10-01 09:06:00 2019-10-01 09:06:02   2 secs
#3 2019-10-01 09:07:04 2019-10-01 09:07:06   2 secs

数据

我已经整理了一下您的数据,并将其用作 df .

I have cleaned up your data a bit and used this as df.

df <- structure(list(Read = c("T", "T", "T", "T", "T", "F", "T", "T", 
"T", "T", "T", "T", "T", ""), Box = c("out", "out", "out", "out", 
"out", "", "out", "out", "in", "in", "out", "out", "out", "hello"
), ID = c("", "", "", "", "", "", "", "", "", "", "", "", "", 
""), Time = c("10/1/2019 9:00:01 AM", "10/1/2019 9:00:02 AM", 
"10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM", "10/1/2019 9:03:00 AM", 
"10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM", "10/1/2019 9:06:02 AM", 
"10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM", "10/1/2019 9:07:04 AM", 
"10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM", "10/1/2019 9:07:08 AM"
)), row.names = c(NA, -14L), class = "data.frame")

这篇关于根据R中的多个条件(dplyr,lubridate)创建开始和结束时间列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆