如何通过时间戳过滤数据集 [英] How to filter a dataset by the time stamp

查看：59 发布时间：2021/5/2 20:54:30 r dplyr subset

本文介绍了如何通过时间戳过滤数据集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理一些鸟类GPS跟踪数据，我想根据时间戳排除点.

一些背景信息-GPS记录器从晚上开始一直持续到晚上和第二天，仅在24小时内跟踪每只鸟.我想做的是排除部署后当天晚上9:30之后取得的分数(因此从赛道的最末端删除分数).作为R新手，我很努力，因为每只鸟的部署日期都不同，所以我不能简单地将 subset()用于特定的日期和时间.

我的数据框(df)的示例:

  BirdID x y日期时间15K12 492719.9 5634805 2015-06-23 18:25:0015K12 492491.5 5635018 2015-06-23 18:27:0015K70 455979.1 5653581 2015-06-24 19:54:0015K70 456040.9 5653668 2015-06-24 19:59:00

因此，假装这些点代表每只动物的GPS追踪起点，我想在6月24日晚上9:30之后删除15K12鸟的鸟，并在6月25日9:30之后删除15K70鸟的点./p>

有什么想法吗?

解决方案

首先，检查df $ Datetime是否为日期变量:

  class(df $ Datetime)

如果不是，您可以使用以下方法进行转换:

  df $ Datetime<-ymd_hms(df& Datetime)

您使用mutate创建了一个名为newdate的新变量，该变量获取鸟类数据的最早日期并设置截止日期，该日期是鸟类观察最早日期的第二天，即21:30:00.

然后，用newdate列过滤Datetime列，并获得早于指定日期的观测值.

  library(dplyr);库(润滑)df％>％group_by(BirdID)％&％;％mutate(newdate = as.POSIXct(date(min(Datetime))+ days(1)+ hours(21)+ minutes(30)))％>％过滤器(Datetime< newdate)

提供了可重复的示例:

  library(dplyr);库(润滑)set.seed(1)#创建一个数据框(1000个观测值)BirdID<-paste(rep(floor(runif(250，1，20))，4)，rep("k"，1000)，rep(floor(runif(250，1，40))，4)，sep =")x<-rnorm(1000，平均值= 47000，标准差= 2000)y<-rnorm(1000，平均值= 5650000，sd = 300000)日期时间<-as.POSIXct(rnorm(1000，平均值= as.numeric(as.POSIXct("2015-06-23 18:25:00"))，sd = 99999)，tz ="GMT"，原点="1970-01-01")df <-data.frame(BirdID，x，y，Datetime，stringsAsFactors = FALSE)#按指定日期过滤数据框df_filtered<-df％>％group_by(BirdID)％&％;％mutate(newdate = as.POSIXct(date(min(Datetime))+ days(1)+ hours(21)+ minutes(30)))％>％过滤器(Datetime< newdate)

这应该可以解决任何问题.

I'm working with some bird GPS tracking data, and I would like to exclude points based on the time stamp.

Some background information- the GPS loggers track each bird for just over 24 hours, starting in the evening, and continuing through the night and the following day. What I would like to do is exclude points taken after 9:30pm on the day AFTER deployment (so removing points from the very end of the track). As an R novice, I'm struggling because the deployment dates differ for each bird, so I can't simply use subset() for a specific date and time.

An example of my dataframe (df):

BirdID    x             y           Datetime
15K12     492719.9      5634805     2015-06-23 18:25:00
15K12     492491.5      5635018     2015-06-23 18:27:00
15K70     455979.1      5653581     2015-06-24 19:54:00  
15K70     456040.9      5653668     2015-06-24 19:59:00

So, pretending these points represent the start of the GPS track for each animal, I would like to remove points after 9:30 pm on June 24 for bird 15K12, and after 9:30 on June 25 for bird 15K70.

Any ideas?

解决方案

First, check if df$Datetime is a date variable:

class(df$Datetime)

If it's not, you can convert it with this:

df$Datetime <- ymd_hms(df&Datetime)

You use mutate to create a new variable called newdate that takes the earliest date of the bird's data and sets the date for cutoff which is the next day at 21:30:00 of the earliest date of a bird's observations.

Then you filter the Datetime column by the newdate column and you get the observations that are found earlier that the specified date.

library(dplyr); library(lubridate)
df %>% 
  group_by(BirdID) %>%
  mutate(newdate = as.POSIXct(date(min(Datetime)) + days(1) + hours(21) + minutes(30))) %>% 
  filter(Datetime < newdate)

Did a reproducible example:

library(dplyr); library(lubridate)

set.seed(1)

# Create a data frame (1000 observations)
BirdID <- paste(rep(floor(runif(250, 1, 20)),4),
  rep("k", 1000), rep(floor(runif(250, 1, 40)),4), sep = "")
x <- rnorm(1000, mean = 47000, sd = 2000)
y <- rnorm(1000, mean = 5650000, sd = 300000)
Datetime <- as.POSIXct(rnorm(1000, mean = as.numeric(as.POSIXct("2015-06-23 18:25:00")), sd = 99999), tz = "GMT", origin = "1970-01-01")
df <- data.frame(BirdID, x, y, Datetime, stringsAsFactors = FALSE)

# Filter the data frame by the specified date
df_filtered <- df %>% 
  group_by(BirdID) %>%
  mutate(newdate = as.POSIXct(date(min(Datetime)) + days(1) + hours(21) + minutes(30))) %>% 
  filter(Datetime < newdate)

This should fix any problem.

这篇关于如何通过时间戳过滤数据集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何通过时间戳过滤数据集 [英] How to filter a dataset by the time stamp

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何通过时间戳过滤数据集 [英] How to filter a dataset by the time stamp

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭