每组填充时间序列的有效方法 [英] Efficient way to Fill Time-Series per group

查看：97 发布时间：2020/10/15 19:05:36 r data.table tidyverse

本文介绍了每组填充时间序列的有效方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在寻找一种按时间填充每个组的时间序列数据集的方法。我使用的效率非常低下的方法是在每个组中拆分数据集，并在其中应用自定义时间序列填充函数（在max和min之间创建序列，然后合并）该列表的所有元素。不用说，此操作不会通过拆分。

I was looking for a way to fill a time series data set by time, per group. The very very inefficient way I was using was to split the data set per group and apply a custom time-series fill function (create sequence between max and min, and merge) in all elements of that list. Needless to say, this operations would not go pass the splitting.

我的数据集看起来像

    source                 grp cnt
 1:     83 2017-06-06 13:00:00   1
 2:     83 2017-06-06 23:00:00   1
 3:     83 2017-06-07 03:00:00   1
 4:     83 2017-06-07 07:00:00   2
 5:     83 2017-06-07 13:00:00   1
 6:     83 2017-06-07 19:00:00   1
 7:     83 2017-06-08 00:00:00   1
 8:     83 2017-06-08 14:00:00   1
 9:     83 2017-06-08 15:00:00   1
10:     83 2017-06-08 20:00:00   1
11:    137 2017-06-04 02:00:00   1
12:    137 2017-06-04 05:00:00   1
13:    137 2017-06-04 23:00:00   1
...

我的尝试是使用 tidyverse 通过利用 complete 函数的方法，即

My attempt was to use tidyverse methods by utilising the complete function, i.e.

library(tidyverse)

d1 %>% 
 group_by(source) %>% 
 complete(source, grp = seq(min(grp), max(grp), by = 'hour'))

但是，大约40-45秒后，出现了一个进度条（在某些dydyverse函数中显然是一个整洁的功能-我怀疑在这种情况下 complete ），估计需要9个小时才能完成。我的数据集非常大，这不是最简单的操作，因此我正在寻找真正有效的东西。

However, after about 40-45 seconds, a progress bar appeared (apparently a neat feature in some tidyverse functions - I suspect complete in this case) which estimated 9 hours to completion. My dataset is very very big and this is not the lightest operation, so something really efficient is what I am looking for.

数据

#dput(d1)
structure(list(source = c("83", "83", "83", "83", "83", "83", 
"83", "83", "83", "83", "137", "137", "137", "137", "137", "137", 
"137", "137", "137", "137", "137", "137", "137", "137"), grp = structure(c(1496743200, 
1496779200, 1496793600, 1496808000, 1496829600, 1496851200, 1496869200, 
1496919600, 1496923200, 1496941200, 1496530800, 1496541600, 1496606400, 
1496617200, 1496649600, 1496696400, 1496808000, 1496844000, 1496876400, 
1496962800, 1497880800, 1497888000, 1497978000, 1497996000), class = c("POSIXct", 
"POSIXt"), tzone = ""), cnt = c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
)), .Names = c("source", "grp", "cnt"), row.names = c(NA, -24L
), class = "data.frame")

每组填充时间序列的有效方法 [英] Efficient way to Fill Time-Series per group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

每组填充时间序列的有效方法 [英] Efficient way to Fill Time-Series per group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭