data.table:对时移窗口内的行进行计数 [英] data.table: count rows within time moving window

查看:56
本文介绍了data.table:对时移窗口内的行进行计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

library(data.table)
df <- data.table(col1 = c('B', 'A', 'A', 'B', 'B', 'B'), col2 = c("2015-03-06 01:37:57", "2015-03-06 01:39:57", "2015-03-06 01:45:28", "2015-03-06 02:31:44", "2015-03-06 03:55:45", "2015-03-06 04:01:40"))

对于每一行,我想计算具有相同'col1'值的行数以及此时间之前10分钟内的时间行(包括)

For each row I want to count number of rows with same values of 'col1' and time within window of past 10 minutes before time of this row(include)

我运行下一个代码:

df$col2 <- as_datetime(df$col2)
window = 10L
(counts = setDT(df)[.(t1=col2-window*60L, t2=col2), on=.((col2>=t1) & (col2<=t2)), 
                     .(counts=.N), by=col1]$counts)

df[, counts := counts]

并遇到下一个错误:

Error in `[.data.table`(setDT(df), .(t1 = col2 - window * 60L, t2 = col2), : Column(s) [(col2] not found in x

我想要下一个结果:

col1    col2              counts
B   2015-03-06 01:37:57     1
A   2015-03-06 01:39:57     1
A   2015-03-06 01:45:28     2
B   2015-03-06 02:31:44     1
B   2015-03-06 03:55:45     1
B   2015-03-06 04:01:40     2


推荐答案

可能的解决方案:

df[.(col1 = col1, t1 = col2 - gap * 60L, t2 = col2)
   , on = .(col1, col2 >= t1, col2 <= t2)
   , .(counts = .N), by = .EACHI][, (2) := NULL][]

它给出:


   col1                col2 counts
1:    B 2015-03-06 01:37:57      1
2:    A 2015-03-06 01:39:57      1
3:    A 2015-03-06 01:45:28      2
4:    B 2015-03-06 02:31:44      1
5:    B 2015-03-06 03:55:45      1
6:    B 2015-03-06 04:01:40      2


关于您的方法的一些注意事项:

A couple of notes about your approach:


  • 您不需要 setDT ,因为您已经用 df > data.table(...)。

  • 您在上的语句未正确指定:您需要用而不是& 分隔连接条件。例如: on =。(col1,col2> = t1,col2< = t2)

  • 使用 by = .EACHI 以获得每一行的结果。

  • You don't need setDT because you already constructed df with data.table(...).
  • You on-statement isn't specified correctly: you need to separate the join conditions with a , and not with a &. For example: on = .(col1, col2 >= t1, col2 <= t2)
  • Use by = .EACHI to get the result for each row.

另一种方法:

df[, counts := .SD[.(col1 = col1, t1 = col2 - gap * 60L, t2 = col2)
                   , on = .(col1, col2 >= t1, col2 <= t2)
                   , .N, by = .EACHI]$N][]

给出相同的结果。

这篇关于data.table:对时移窗口内的行进行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆