data.table:计算时间移动​​窗口中行时间的统计信息 [英] data.table: calculate statistics of rows time within time moving window

查看:83
本文介绍了data.table:计算时间移动​​窗口中行时间的统计信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

library(data.table)
library(lubridate)
df <- data.table(col1 = c('A', 'A', 'A', 'B', 'B', 'B'), col2 = c("2015-03-06 01:37:57", "2015-03-06 01:39:57", "2015-03-06 01:45:28", "2015-03-06 02:31:44", "2015-03-06 03:55:45", "2015-03-06 04:01:40"))

对于每一行,我想计算具有相同'值的行的时间(col2)的标准偏差col1'和该行时间之前10分钟内的时间(包括)

For each row I want to calculate standard deviation of time(col2) of rows with same values of 'col1' and time within window of past 10 minutes before time of this row(include)

我使用下一种方法:

df$col2 <- as_datetime(df$col2)
gap <- 10L
df[, feat1 := .SD[.(col1 = col1, t1 = col2 - gap * 60L, t2 = col2)
                   , on = .(col1, col2 >= t1, col2 <= t2)
                   , .(sd_time = sd(as.numeric(col2))), by = .EACHI]$sd_time][]

仅NA值而不是以秒为单位的值

as result I see only NA values instead of values in seconds

例如,第三行(col = A and col2 = 2015-03-06 01:45 :28)
我已经通过另一种方式手动计算:

For example for third row (col="A" and col2 = "2015-03-06 01:45:28") I have calculated manually by next way:

v <- c("2015-03-06 01:37:57", "2015-03-06 01:39:57", "2015-03-06 01:45:28")
v <- as_datetime(v)
sd(v) = 233.5815


推荐答案

data.table 解决方案:

df[,col3:=as.numeric(col2)]
df[, feat1 := {
  d <- data$col3 - col3
  sd(data$col3[col1 == data$col1 & d <= 0 & d >= -gap * 60L])
},
by = list(col3, col1)]

另一种循环遍历col1,col2与 mapply 的所有组合的方法:

Another way to loop over all combinations of col1, col2 with mapply:

df[,col3:=as.numeric(col2)]

df[, feat1:=mapply(Date = col3,ID = col1, function(Date, ID) {
  DateVect=df[col1 == ID,col3]
  d <- DateVect - Date
  sd(DateVect[d <= 0 & d >= -gap * 60L])})][]

这篇关于data.table:计算时间移动​​窗口中行时间的统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆