R:如何在毫秒级别重新采样日期时间变量? [英] R: how to resample a datetime variable at the millisecond level?
问题描述
我有一个如下数据框
library(dplyr)
library(lubridate)
time = c('2013-01-03 22:04:21.549', '2013-01-03 22:04:21.549', '2013-01-03 22:04:21.559', '2013-01-03 22:04:23.559' )
value = c(1,2,3,4)
data <- data_frame(time, value)
data <-data %>% mutate(time = ymd_hms(time))
# A tibble: 4 × 2
time value
<dttm> <dbl>
1 2013-01-03 22:04:21.549 1
2 2013-01-03 22:04:21.549 2
3 2013-01-03 22:04:21.559 3
4 2013-01-03 22:04:23.559 4
我想每200 毫秒对这个数据帧进行重新采样.
I would like to resample this dataframe every 200 milliseconds.
也就是说,每200毫秒取平均值value
.
That is, take the average of value
every 200 milliseconds.
我知道可以使用lubridate::floor_date(time, '1 second')
达到second
精度,但不能使用milliseconds
.
I know can use lubridate::floor_date(time, '1 second')
up to the second
precision, but not for milliseconds
.
在上面的示例中,第1
,2
和3
行应分组在一起,而第4
行应单独分组(请注意,彼此之间相距2
秒).
In the example above, row 1
,2
, and 3
should be grouped together while row 4
should be alone (note it is 2
seconds apart from the others).
有什么想法吗? 谢谢!
Any ideas? Thanks!
推荐答案
您对xts解决方案的评论要求将此内容重新插入"数据框,这一事实使我觉得您要么想要合并的结果,要么按时间分组的列.这就是ave
函数在base R中的作用.可能有一个dplyr
等效项,但我更像是base-R-guy:
The fact that your comment to the xts solution asked for this to be "plugged back in" to the dataframe made me think you either wanted a merged result or a grouped-by-time column. That's what the ave
function does in base R. There's probably a dplyr
equivalent, but I'm more of a base-R-guy:
data$ms200mn <- ave(data$value,
cut( arg <- as.numeric(data$time) ,
breaks=seq( floor(arg[1]), ceil(arg[4]), by=0.2) ),
FUN=mean)
> data
# A tibble: 4 × 3
time value ms200mn
<dttm> <dbl> <dbl>
1 2013-01-03 22:04:21 1 2
2 2013-01-03 22:04:21 2 2
3 2013-01-03 22:04:21 3 2
4 2013-01-03 22:04:23 4 4
这并不是真正的采样"(或重采样),而是聚合. seq.POSIXt
函数没有'msec'选项(因此需要转换为数字秒),并且不允许小数秒.
This isn't really properly called "sampling" (or resampling), but is rather aggregation. There is no 'msec' option for the seq.POSIXt
-function (so needed to convert to numeric seconds) and fractional seconds are not allowed.
说明:
cut(arg <- as.numeric(data$time), breaks=seq( floor(arg[1]), ceil(arg[4]), by=0.2) )
它是根据由一系列中断定义的组中的项目分类"或分类",这些中断从第一个项目的下方开始,到最后一个项目的上方结束.需要创建arg
值,因为seq
函数可以使用(由于我不明白的原因)无法使用原始'datetime'变量.
It is "classifying" or "categorizing" items in groups defined by a sequence of breaks starting below the first item and ending above the last item. The arg
-value needed to be created because (for reasons that I don't understand) raw 'datetime' variables cannot be used can be used by the seq
function.
这篇关于R:如何在毫秒级别重新采样日期时间变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!