检查df2中的datetime列的值是否在R中的df1的datateime值之内 [英] Check if values of datetime column in df2 is within datateime values of df1 in R

查看:118
本文介绍了检查df2中的datetime列的值是否在R中的df1的datateime值之内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个这样的数据框:

I have a 2 dataframes like this:

df1

ID <- c("A","B","A","C","C","B","B","A")
StartDatetime <- c("2015-09-29 00:00:13", "2015-09-29 05:55:50", "2015-09-29 11:45:14", "2015-09-29 15:24:00",
                               "2015-09-29 17:24:12", "2015-09-29 21:34:31", "2015-09-29 22:22:22", "2015-09-29 23:38:22") 
EndDatetime <- c("2015-09-29 00:13:56", "2015-09-29 06:13:50", "2015-09-29 12:23:14", "2015-09-29 15:58:00",
                             "2015-09-29 17:58:17", "2015-09-29 22:06:31", "2015-09-29 22:52:28", "2015-09-29 23:55:22") 
MEASUREMENT <- c("Length","Length","Width","Length","Width","Height","Length","Height") 
df1 <- data.frame(ID,StartDatetime,EndDatetime,MEASUREMENT)

df2

ID <- c("A","B","A","C","C","B","B")
MStart <- c("09/29/2015 00:02:13", "09/29/2015 05:56:50", "09/30/2015 11:55:14", "09/29/2015 15:33:00",
                    "09/29/2015 17:28:12", "09/29/2015 21:30:31", "09/29/2015 22:26:22") 
MEnd <- c("09/29/2015 00:11:12", "09/29/2015 06:55:50", "09/30/2015 11:54:14", "09/29/2015 15:47:00",
                  "09/29/2015 17:44:12", "09/29/2015 22:02:31", "09/29/2015 22:44:22") 
Measurement <- c("Length","Length","Width","Length","Width","Height","Length") 
df2 <- data.frame(ID,MStart,MEnd,Measurement)

我正在尝试解决一个有趣的问题,即检查ID是否具有MStart&每次测量df2中的MEnd都在df1中的日期时间范围内。逻辑重新调整

I am trying to solve an interesting problem of checking if the ID's having the MStart & MEnd in df2 are within the date time ranges in df1 for every measurement. The logic retuns

TRUE if (MStart & MEnd) **is within** (StartDatetime & EndDatetime)
FALSE if (MStart & MEnd) **is not within** (StartDatetime & EndDatetime)

我的期望输出将是df3,其中包含df1中的所有列,并添加具有True或False值的列。

My Desired output will be df3 with all the columns in df1 and also add a column with True or False values.

df3

  ID       StartDatetime         EndDatetime MEASUREMENT True_False
1  A 2015-09-29 00:00:13 2015-09-29 00:13:56      Length       TRUE
2  B 2015-09-29 05:55:50 2015-09-29 06:13:50      Length      FALSE
3  A 2015-09-29 11:55:14 2015-09-29 12:23:14       Width      FALSE
4  C 2015-09-29 15:24:00 2015-09-29 15:58:00      Length       TRUE
5  C 2015-09-29 17:24:12 2015-09-29 17:58:17       Width       TRUE
6  B 2015-09-29 21:34:31 2015-09-29 22:06:31      Height      FALSE
7  B 2015-09-29 22:22:22 2015-09-29 22:52:28      Length       TRUE
8  A 2015-09-29 23:38:22 2015-09-29 23:55:22      Height      FALSE

当尝试转换df2的日期格式并且无法前进时,我陷入了此错误。

I am stuck at this error when trying to convert the date format of df2 and not able to move forward.

**df2$MStart <- as.POSIXct(df2$MStart,"%Y-%m-%d %H:%M:%S")**

Error in as.POSIXlt.character(as.character(x), ...) : 
  character string is not in a standard unambiguous format

请指导我如何解决此问题。我正在尝试使用dplyr或data.table解决此问题,但不知道使用日期时间的逻辑。

Please guide me on how I could solve this problem. I am trying to use dplyr or data.table for solving this but not knowing the logic for working with date times.

编辑
我刚刚进行了编辑并删除了df2的最后一行,因此现在只有7行。我也想解决这种情况,因为我的较大数据集在df1中包含更多行,而在df2中包含更少行,因此也想从df1中返回那些带有FALSE的不匹配行。

EDIT I just made an edit and removed the last row of df2 and hence it has only 7 rows now. I want to address this case as well since my bigger dataset contains more rows in df1 and fewer rows in df2 and hence would like to return those unmatched rows with FALSE from df1 as well.

推荐答案

library(dplyr)
library(lubridate)

df1_fix = 
  df1 %>%
  mutate(StartDatetime = ymd_hms(StartDatetime),
         EndDatetime = ymd_hms(EndDatetime)) %>%
  rename(Measurement = MEASUREMENT)


df2_fix = 
  df2 %>%
  mutate(MStart  = mdy_hms(MStart),
         MEnd = mdy_hms(MEnd))

initial_join = 
  df1_fix %>%
  full_join(df2_fix)

initial_join %>%
  filter(StartDatetime < EndDatetime,
         MStart < MEnd) %>%
  mutate(test = 
           int_overlaps(
             interval(StartDatetime, EndDatetime),
               interval(MStart, MEnd) ) ) %>%
  right_join(initial_join) %>%
  mutate(test = test %in% TRUE)

这篇关于检查df2中的datetime列的值是否在R中的df1的datateime值之内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆