使用 dplyr 基于大于日期时间的 POSIXct 日期和时间的子集数据框 [英] Subset dataframe based on POSIXct date and time greater than datetime using dplyr

查看:19
本文介绍了使用 dplyr 基于大于日期时间的 POSIXct 日期和时间的子集数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不确定将日期时间选择为 POSIXct 格式有什么问题.我已经阅读了一些关于基于 as.Date 对数据框进行子集化的评论,我可以让它毫无问题地工作.我也读过很多帖子,建议过滤 POSIXct 格式应该可以工作,但由于某种原因我无法让它工作.

I am not sure what is going wrong with selecting date times as a POSIXct format. I have read several comments on subsetting a dataframe based on as.Date and I can get that to work without an issue. I have also read many posts suggesting that filtering POSIXct formats should work, but for some reason I cannot get it to work.

示例数据框:

library(lubridate)
library(dplyr)

date_test <- seq(ymd_hms('2016-07-01 00:00:00'),ymd_hms('2016-08-01 00:00:00'), by = '15 min')
date_test <- data.frame(date_test)
date_test$datetime <- date_test$date_test
date_test <- select(date_test, -date_test)

我检查了它是否为 POSIXct 格式,然后尝试了几种方法来对大于 2016-07-01 01:15:00 的数据帧进行子集化.但是,输出永远不会显示小于 2016-07-01 01:15:00 的日期时间被删除.很抱歉,如果有人在某处问过这个问题而我找不到它,但我已经查看并试图让它发挥作用.我使用 UTC 作为时区以避免夏令时问题,所以这不是这里的问题 - 除非过滤器需要它.

I checked that it is in POSIXct format and then tried several ways to subset the dataframe greater than 2016-07-01 01:15:00. However the output never shows the date times less than 2016-07-01 01:15:00 being removed. I am sorry if this has been asked somewhere and I cannot find it but I have looked and tried to get this to work. I am using UTC as the timezone to avoid daylight savings time issues so that is not the issue here - unless the filter requires it.

class(date_test$datetime)

date_test <- date_test %>% filter(datetime > '2016-07-01 01:15:00')

date_test <- date_test %>% 
  filter(datetime > as.POSIXct("2016-07-01 00:15"))

date_test <- subset(date_test, datetime > as.POSIXct('2016-07-01 01:15:00')) 

现在如果我过滤使用:

date_test <- date_test %>% 
  filter(datetime > as.POSIXct("2016-07-10 01:15:00"))

输出很奇怪,落后一天,时间不对?

the output is very strange with a day behind and the wrong time?

2016-07-09 13:30:00
2016-07-09 13:45:00
2016-07-09 14:00:00
2016-07-09 14:15:00
2016-07-09 14:30:00

如果有帮助,我将 MAC OS Sierra 与 R Studio 版本 1.0.143 和 R You Stupid Darkness、DPLYR 0.5 和 Lubridate 1.6 一起使用

If it helps I am using MAC OS Sierra with R Studio Version 1.0.143 and R You Stupid Darkness, DPLYR 0.5 and Lubridate 1.6

推荐答案

ymd_hms 默认使用UTC"时区中的 POSIXct 次 - as.POSIXct 使用系统时区(例如 - 对我而言是澳大利亚)-您需要始终使用 ymd_hms 或根据 Dave 在评论中的建议更改为UTC"时区.

ymd_hms uses POSIXct times in "UTC" timezone by default - as.POSIXct uses the system timezone (e.g. - Australia for me) - you need to consistently use ymd_hms or change to the "UTC" timezone as per Dave's suggestion in the comments.

例如:这些示例有效:

date_test <- seq(ymd_hms('2016-07-01 00:30:00'),ymd_hms('2016-07-01 01:30:00'), by = '15 min')
date_test <- data.frame(datetime=date_test)
date_test

#             datetime
#1 2016-07-01 00:30:00
#2 2016-07-01 00:45:00
#3 2016-07-01 01:00:00
#4 2016-07-01 01:15:00
#5 2016-07-01 01:30:00

date_test %>% 
  filter(datetime > as.POSIXct("2016-07-01 01:00:00", tz="UTC"))

date_test %>% 
  filter(datetime > ymd_hms("2016-07-01 01:00:00"))

#             datetime
#1 2016-07-01 01:15:00
#2 2016-07-01 01:30:00

这篇关于使用 dplyr 基于大于日期时间的 POSIXct 日期和时间的子集数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆