行与 R 之间的固定时间 [英] Regular time between rows with R
问题描述
我有这个样本:
structure(list(timestamp_pretty = structure(c(1L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 17L, 18L, 20L, 2L, 11L, 15L,
16L, 19L), .Label = c("01/06/2014 00:04:00", "01/06/2014 00:04:01",
"01/06/2014 00:07:10", "01/06/2014 00:10:10", "01/06/2014 00:13:11",
"01/06/2014 00:19:20", "01/06/2014 00:20:02", "01/06/2014 00:22:20",
"01/06/2014 00:25:30", "01/06/2014 01:11:11", "01/06/2014 01:16:03",
"01/06/2014 01:17:12", "01/06/2014 01:20:41", "01/06/2014 01:26:51",
"01/06/2014 01:28:03", "01/06/2014 01:43:03", "01/06/2014 01:45:20",
"01/06/2014 02:12:01", "01/06/2014 02:13:05", "01/06/2014 02:18:01"
), class = "factor"), mmsi = c(205477000L, 205477000L, 205477000L,
205477000L, 205477000L, 205477000L, 205477000L, 205477000L, 205477000L,
205477000L, 205477000L, 205477000L, 205477000L, 205477000L, 205477000L,
205482000L, 205482000L, 205482000L, 205482000L, 205482000L)), .Names = c("timestamp_pretty",
"mmsi"), row.names = c(8L, 9L, 17L, 16L, 4L, 12L, 3L, 14L, 10L,
7L, 13L, 19L, 6L, 15L, 1L, 11L, 18L, 20L, 2L, 5L), class = "data.frame")
列diff_time_seconds_timestamp_pretty
是mmsi
的行之间的时间差.
the column diff_time_seconds_timestamp_pretty
is the difference of time between the rows by mmsi
.
我想删除一些信号(行),并为每个 mmsi
只保留每 XXX 秒(例如 180 秒)的信号.
I would like to remove some signals (rows) and to keep only the signals every XXX seconds (for example 180 seconds) for each mmsi
.
我的想法是使用包zoo
,它似乎是为此而做的.但我无法做到,所以我现在正在寻找:
My thought was to use the package zoo
, it seems to be done for this. But I could not manage so I am now looking to:
- 添加每个 mmsi
diff_time_seconds_timestamp_pretty
的总和 - 去除彼此之间不到 180 秒的信号.
- 保持信号至少相隔 180 秒
我尝试与
library(dplyr)
test <- data %>%
group_by(mmsi) %>%
mutate(cum.sum=cumsum(diff_time_seconds_timestamp_pretty))
但似乎离我想要的还很远.
but it seems that I am still far from what I want.
欢迎帮助!
推荐答案
这里有两种方法将日期时间划分为 180 秒的间隔,然后只保留每个中的最后一个数据点.
Here are two approaches which divide the datetimes into 180 second intervals and then keep only the last data point in each.
1) chron/zoo 将时间戳转换为 chron 并使用该包中的 trunc.times
将它们截断为每 180 秒(即每 3 分钟).然后使用 tail
函数在相同的日期/时间将它们读入动物园聚合,以便只保留最后一个:
1) chron/zoo Convert the time stamps to chron and use trunc.times
from that package to truncate them to every 180 seconds (i.e. every 3 minutes). Then read them into zoo aggregating on equal date/times using the tail
function so that only the last gets retained:
library(chron)
library(zoo)
# return the chron date time at start of 180 sec interval each point is in
to180ch <- function(x) trunc(as.chron(as.character(x), "%m/%d/%Y %H:%M:%S"), "00:03:00")
read.zoo(data, FUN = to180ch, aggregate = function(x) tail(x, 1))
结果是以下使用 chron 日期/时间的动物园对象:
The result is the following zoo object that uses chron date/times:
(01/06/14 00:03:00) (01/06/14 00:06:00) (01/06/14 00:09:00) (01/06/14 00:12:00)
205482000 205477000 205477000 205477000
(01/06/14 00:18:00) (01/06/14 00:21:00) (01/06/14 00:24:00) (01/06/14 01:09:00)
205477000 205477000 205477000 205477000
(01/06/14 01:15:00) (01/06/14 01:18:00) (01/06/14 01:24:00) (01/06/14 01:27:00)
205482000 205477000 205477000 205482000
(01/06/14 01:42:00) (01/06/14 01:45:00) (01/06/14 02:12:00) (01/06/14 02:18:00)
205482000 205477000 205482000 205477000
如果首选的是将数据帧子集缩小到 180 秒的间隔,那么试试这个:
If what is preferred is to just subset the data frame down to 180 second intervals then try this:
subset(data, !duplicated(to180ch(timestamp_pretty), fromLast = TRUE))
2) 无包 转换为 POSIXct 然后数字,执行截断并转换回 POSIXct.最后aggregate
使用tail
:
2) No packages Convert to POSIXct and then numeric, perform the truncation and convert back to POSIXct. Finally aggregate
using tail
:
# return the POSIXct date time at start of 180 sec interval each point is in
to180ct <- function(x) {
p <- as.POSIXct(as.character(x), format = "%m/%d/%Y %H:%M:%S")
as.POSIXct(180 * as.numeric(p) %/% 180, origin = "1970-01-01")
}
aggregate(data[2], list(timestamp = to180ct(data[[1]])), tail, 1)
使用 POSIXct timestamp
给这个数据框:
giving this data frame with a POSIXct timestamp
:
timestamp mmsi
1 2014-01-06 00:03:00 205482000
2 2014-01-06 00:06:00 205477000
3 2014-01-06 00:09:00 205477000
4 2014-01-06 00:12:00 205477000
5 2014-01-06 00:18:00 205477000
6 2014-01-06 00:21:00 205477000
7 2014-01-06 00:24:00 205477000
8 2014-01-06 01:09:00 205477000
9 2014-01-06 01:15:00 205482000
10 2014-01-06 01:18:00 205477000
11 2014-01-06 01:24:00 205477000
12 2014-01-06 01:27:00 205482000
13 2014-01-06 01:42:00 205482000
14 2014-01-06 01:45:00 205477000
15 2014-01-06 02:12:00 205482000
16 2014-01-06 02:18:00 205477000
如 (1) 中一样,如果只是对数据帧进行子集化,则只需将 (1) 中 subset
行中的 to180ch
替换为 to180ct
像这样:
As in (1) if what is wanted is just to subset the data frame then just replace to180ch
in the subset
line in (1) with to180ct
like this:
subset(data, !duplicated(to180ct(timestamp_pretty), fromLast = TRUE))
这篇关于行与 R 之间的固定时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!