行与 R 之间的固定时间 [英] Regular time between rows with R

查看:38
本文介绍了行与 R 之间的固定时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个样本:

  structure(list(timestamp_pretty = structure(c(1L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 17L, 18L, 20L, 2L, 11L, 15L, 
16L, 19L), .Label = c("01/06/2014 00:04:00", "01/06/2014 00:04:01", 
"01/06/2014 00:07:10", "01/06/2014 00:10:10", "01/06/2014 00:13:11", 
"01/06/2014 00:19:20", "01/06/2014 00:20:02", "01/06/2014 00:22:20", 
"01/06/2014 00:25:30", "01/06/2014 01:11:11", "01/06/2014 01:16:03", 
"01/06/2014 01:17:12", "01/06/2014 01:20:41", "01/06/2014 01:26:51", 
"01/06/2014 01:28:03", "01/06/2014 01:43:03", "01/06/2014 01:45:20", 
"01/06/2014 02:12:01", "01/06/2014 02:13:05", "01/06/2014 02:18:01"
), class = "factor"), mmsi = c(205477000L, 205477000L, 205477000L, 
205477000L, 205477000L, 205477000L, 205477000L, 205477000L, 205477000L, 
205477000L, 205477000L, 205477000L, 205477000L, 205477000L, 205477000L, 
205482000L, 205482000L, 205482000L, 205482000L, 205482000L)), .Names = c("timestamp_pretty", 
"mmsi"), row.names = c(8L, 9L, 17L, 16L, 4L, 12L, 3L, 14L, 10L, 
7L, 13L, 19L, 6L, 15L, 1L, 11L, 18L, 20L, 2L, 5L), class = "data.frame")

diff_time_seconds_timestamp_prettymmsi的行之间的时间差.

the column diff_time_seconds_timestamp_prettyis the difference of time between the rows by mmsi.

我想删除一些信号(行),并为每个 mmsi 只保留每 XXX 秒(例如 180 秒)的信号.

I would like to remove some signals (rows) and to keep only the signals every XXX seconds (for example 180 seconds) for each mmsi.

我的想法是使用包zoo,它似乎是为此而做的.但我无法做到,所以我现在正在寻找:

My thought was to use the package zoo, it seems to be done for this. But I could not manage so I am now looking to:

  1. 添加每个 mmsi diff_time_seconds_timestamp_pretty 的总和
  2. 去除彼此之间不到 180 秒的信号.
  3. 保持信号至少相隔 180 秒

我尝试与

library(dplyr)
test <- data %>% 
  group_by(mmsi) %>%
  mutate(cum.sum=cumsum(diff_time_seconds_timestamp_pretty))

但似乎离我想要的还很远.

but it seems that I am still far from what I want.

欢迎帮助!

推荐答案

这里有两种方法将日期时间划分为 180 秒的间隔,然后只保留每个中的最后一个数据点.

Here are two approaches which divide the datetimes into 180 second intervals and then keep only the last data point in each.

1) chron/zoo 将时间戳转换为 chron 并使用该包中的 trunc.times 将它们截断为每 180 秒(即每 3 分钟).然后使用 tail 函数在相同的日期/时间将它们读入动物园聚合,以便只保留最后一个:

1) chron/zoo Convert the time stamps to chron and use trunc.times from that package to truncate them to every 180 seconds (i.e. every 3 minutes). Then read them into zoo aggregating on equal date/times using the tail function so that only the last gets retained:

library(chron)
library(zoo)
# return the chron date time at start of 180 sec interval each point is in
to180ch <- function(x) trunc(as.chron(as.character(x), "%m/%d/%Y %H:%M:%S"), "00:03:00")
read.zoo(data, FUN = to180ch, aggregate = function(x) tail(x, 1))

结果是以下使用 chron 日期/时间的动物园对象:

The result is the following zoo object that uses chron date/times:

(01/06/14 00:03:00) (01/06/14 00:06:00) (01/06/14 00:09:00) (01/06/14 00:12:00) 
          205482000           205477000           205477000           205477000 
(01/06/14 00:18:00) (01/06/14 00:21:00) (01/06/14 00:24:00) (01/06/14 01:09:00) 
          205477000           205477000           205477000           205477000 
(01/06/14 01:15:00) (01/06/14 01:18:00) (01/06/14 01:24:00) (01/06/14 01:27:00) 
          205482000           205477000           205477000           205482000 
(01/06/14 01:42:00) (01/06/14 01:45:00) (01/06/14 02:12:00) (01/06/14 02:18:00) 
          205482000           205477000           205482000           205477000 

如果首选的是将数据帧子集缩小到 180 秒的间隔,那么试试这个:

If what is preferred is to just subset the data frame down to 180 second intervals then try this:

subset(data, !duplicated(to180ch(timestamp_pretty), fromLast = TRUE))

2) 无包 转换为 POSIXct 然后数字,执行截断并转换回 POSIXct.最后aggregate使用tail:

2) No packages Convert to POSIXct and then numeric, perform the truncation and convert back to POSIXct. Finally aggregate using tail:

# return the POSIXct date time at start of 180 sec interval each point is in
to180ct <- function(x) {
    p <- as.POSIXct(as.character(x), format = "%m/%d/%Y %H:%M:%S")
    as.POSIXct(180 * as.numeric(p) %/% 180, origin = "1970-01-01")
}
aggregate(data[2], list(timestamp = to180ct(data[[1]])), tail, 1)

使用 POSIXct timestamp 给这个数据框:

giving this data frame with a POSIXct timestamp:

             timestamp      mmsi
1  2014-01-06 00:03:00 205482000
2  2014-01-06 00:06:00 205477000
3  2014-01-06 00:09:00 205477000
4  2014-01-06 00:12:00 205477000
5  2014-01-06 00:18:00 205477000
6  2014-01-06 00:21:00 205477000
7  2014-01-06 00:24:00 205477000
8  2014-01-06 01:09:00 205477000
9  2014-01-06 01:15:00 205482000
10 2014-01-06 01:18:00 205477000
11 2014-01-06 01:24:00 205477000
12 2014-01-06 01:27:00 205482000
13 2014-01-06 01:42:00 205482000
14 2014-01-06 01:45:00 205477000
15 2014-01-06 02:12:00 205482000
16 2014-01-06 02:18:00 205477000

如 (1) 中一样,如果只是对数据帧进行子集化,则只需将 (1) 中 subset 行中的 to180ch 替换为 to180ct 像这样:

As in (1) if what is wanted is just to subset the data frame then just replace to180ch in the subset line in (1) with to180ct like this:

subset(data, !duplicated(to180ct(timestamp_pretty), fromLast = TRUE))

这篇关于行与 R 之间的固定时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆