R:如何在毫秒级别重新采样日期时间变量? [英] R: how to resample a datetime variable at the millisecond level?

查看:124
本文介绍了R:如何在毫秒级别重新采样日期时间变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下数据框

library(dplyr)
library(lubridate)
time = c('2013-01-03 22:04:21.549', '2013-01-03 22:04:21.549', '2013-01-03 22:04:21.559', '2013-01-03 22:04:23.559' )
value = c(1,2,3,4)

data <- data_frame(time, value)
data <-data %>%  mutate(time = ymd_hms(time))

# A tibble: 4 × 2
                     time value
                   <dttm> <dbl>
1 2013-01-03 22:04:21.549     1
2 2013-01-03 22:04:21.549     2
3 2013-01-03 22:04:21.559     3
4 2013-01-03 22:04:23.559     4

我想每200 毫秒对这个数据帧进行重新采样.

I would like to resample this dataframe every 200 milliseconds.

也就是说,每200毫秒取平均值value.

That is, take the average of value every 200 milliseconds.

我知道可以使用lubridate::floor_date(time, '1 second')达到second精度,但不能使用milliseconds.

I know can use lubridate::floor_date(time, '1 second') up to the second precision, but not for milliseconds.

在上面的示例中,第123行应分组在一起,而第4行应单独分组(请注意,彼此之间相距2秒).

In the example above, row 1,2, and 3 should be grouped together while row 4 should be alone (note it is 2 seconds apart from the others).

有什么想法吗? 谢谢!

Any ideas? Thanks!

推荐答案

您对xts解决方案的评论要求将此内容重新插入"数据框,这一事实使我觉得您要么想要合并的结果,要么按时间分组的列.这就是ave函数在base R中的作用.可能有一个dplyr等效项,但我更像是base-R-guy:

The fact that your comment to the xts solution asked for this to be "plugged back in" to the dataframe made me think you either wanted a merged result or a grouped-by-time column. That's what the ave function does in base R. There's probably a dplyr equivalent, but I'm more of a base-R-guy:

 data$ms200mn <- ave(data$value, 
                     cut( arg <- as.numeric(data$time) , 
                                breaks=seq( floor(arg[1]), ceil(arg[4]), by=0.2) ),
                     FUN=mean)
>  data
# A tibble: 4 × 3
                 time value ms200mn
               <dttm> <dbl>   <dbl>
1 2013-01-03 22:04:21     1       2
2 2013-01-03 22:04:21     2       2
3 2013-01-03 22:04:21     3       2
4 2013-01-03 22:04:23     4       4

这并不是真正的采样"(或重采样),而是聚合. seq.POSIXt函数没有'msec'选项(因此需要转换为数字秒),并且不允许小数秒.

This isn't really properly called "sampling" (or resampling), but is rather aggregation. There is no 'msec' option for the seq.POSIXt-function (so needed to convert to numeric seconds) and fractional seconds are not allowed.

说明:

cut(arg <- as.numeric(data$time), breaks=seq( floor(arg[1]), ceil(arg[4]), by=0.2) )

它是根据由一系列中断定义的组中的项目分类"或分类",这些中断从第一个项目的下方开始,到最后一个项目的上方结束.需要创建arg值,因为seq函数可以使用(由于我不明白的原因)无法使用原始'datetime'变量.

It is "classifying" or "categorizing" items in groups defined by a sequence of breaks starting below the first item and ending above the last item. The arg-value needed to be created because (for reasons that I don't understand) raw 'datetime' variables cannot be used can be used by the seq function.

这篇关于R:如何在毫秒级别重新采样日期时间变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆