R:如何根据日期和时间将时间段的一行分成多行 [英] R: how can I split one row of a time period into multiple rows based on day and time
问题描述
我正在尝试根据日期和时间在Excel文件中拆分行。数据来自一项研究,参与者需要佩戴跟踪手表。数据集的每一行都以参与者戴上手表开始(变量:穿戴时间开始),然后以参与者摘下设备结束(变量:穿戴时间结束)。
I am trying to split rows in an excel file based on day and time. The data is from a study which participants will need to wear a tracking watch. Each row of the data set is started with participants put on the watch (Variable: 'Wear Time Start ') and ended with them taking off the device (Variable: 'Wear Time End').
我需要计算每个参与者每天佩戴该设备多少小时(不在每个时间段连续显示)。
I need to calculate how many hours of each participant wearing the device on each day (NOT each time period in one row).
拆分前的数据集:
ID WearStart WearEnd
1 01 2018-05-14 09:00:00 2018-05-14 20:00:00
2 01 2018-05-14 21:30:00 2018-05-15 02:00:00
3 01 2018-05-15 07:00:00 2018-05-16 22:30:00
4 01 2018-05-16 23:00:00 2018-05-16 23:40:00
5 01 2018-05-17 01:00:00 2018-05-19 15:00:00
6 02 ...
关于拆分前数据集的一些解释:'WearStart'和'WearEnd'的数据类型为POSIXlt。
Some explanation about the data set before split: the data type of 'WearStart' and 'WearEnd' are POSIXlt.
拆分后所需的输出:
ID WearStart WearEnd Interval
1 01 2018-05-14 09:00:00 2018-05-14 20:00:00 11
2 01 2018-05-14 21:30:00 2018-05-15 00:00:00 2.5
3 01 2018-05-15 00:00:00 2018-05-15 02:00:00 2
4 01 2018-05-15 07:00:00 2018-05-16 00:00:00 17
5 01 2018-05-16 00:00:00 2018-05-16 22:30:00 22.5
4 01 2018-05-16 23:00:00 2018-05-16 23:40:00 0.4
5 01 2018-05-17 01:00:00 2018-05-18 00:00:00 23
6 01 2018-05-18 00:00:00 2018-05-19 00:00:00 24
7 01 2018-05-19 00:00:00 2018-05-19 15:00:00 15
然后我需要根据日期来累积小时数:
Then I need to accumulate hours based on day:
ID Wear_Day Total_Hours
1 01 2018-05-14 13.5
2 01 2018-05-15 19
3 01 2018-05-16 22.9
4 01 2018-05-17 23
5 01 2018-05-18 24
4 01 2018-05-19 15
推荐答案
所以,我重新设计了整个答案。请检查代码。我很确定这就是您想要的。
So, I reworked the entire answer. Please, review the code. I am pretty sure this is what you want.
简短摘要
问题是您需要拆分在不同日期开始和结束的行。您需要递归执行此操作。因此,我将数据帧拆分为1行数据帧的列表。对于每个我都检查开始和结束是否在同一天。如果没有,我将其设置为2行数据帧,并调整开始和结束时间。然后将其再次拆分为1行数据帧的列表,依此类推。
最后有一个嵌套的1行数据框列表,其中开始和结束在同一天。然后将此列表再次递归绑定在一起。
The problem is that you need to split rows which start and end on different dates. And you need to do this recursively. So, I split the dataframe into a list of 1-row dataframes. For each I check whether start and end is on the same day. If not, I make it a 2-row dataframe with the adjusted start and end times. This is then split up again into a list of 1-row dataframes and so on so forth. In the end there is a nested list of 1-row dataframes where start and end is on the same day. And this list is then recursively bound together again.
# Load Packages ---------------------------------------------------------------------------------------------------
library(tidyverse)
library(lubridate)
df <- tribble(
~ID, ~WearStart, ~WearEnd
, 01, "2018-05-14 09:00:00", "2018-05-14 20:00:00"
, 01, "2018-05-14 21:30:00", "2018-05-15 02:00:00"
, 01, "2018-05-15 07:00:00", "2018-05-16 22:30:00"
, 01, "2018-05-16 23:00:00", "2018-05-16 23:40:00"
, 01, "2018-05-17 01:00:00", "2018-05-19 15:00:00"
)
df <- df %>% mutate_at(vars(starts_with("Wear")), ymd_hms)
# Helper Functions ------------------------------------------------------------------------------------------------
endsOnOtherDay <- function(df){
as_date(df$WearStart) != as_date(df$WearEnd)
}
split1rowInto2Days <- function(df){
df1 <- df
df2 <- df
df1$WearEnd <- as_date(df1$WearStart) + days(1) - milliseconds(1)
df2$WearStart <- as_date(df2$WearStart) + days(1)
rbind(df1, df2)
}
splitDates <- function(df){
if (nrow(df) > 1){
return(df %>%
split(f = 1:nrow(df)) %>%
lapply(splitDates) %>%
reduce(rbind))
}
if (df %>% endsOnOtherDay()){
return(df %>%
split1rowInto2Days() %>%
splitDates())
}
df
}
# The actual Calculation ------------------------------------------------------------------------------------------
df %>%
splitDates() %>%
mutate(wearDuration = difftime(WearEnd, WearStart, units = "hours")
, wearDay = as_date(WearStart)) %>%
group_by(ID, wearDay) %>%
summarise(wearDuration_perDay = sum(wearDuration))
ID wearDay wearDuration_perDay
<dbl> <date> <drtn>
1 1 2018-05-14 13.50000 hours
2 1 2018-05-15 19.00000 hours
3 1 2018-05-16 23.16667 hours
4 1 2018-05-17 23.00000 hours
5 1 2018-05-18 24.00000 hours
6 1 2018-05-19 15.00000 hours
这篇关于R:如何根据日期和时间将时间段的一行分成多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!