计算每个组的data.table的窗口中的值的数量 [英] Count the number of values in a window of a data.table per group

查看:141
本文介绍了计算每个组的data.table的窗口中的值的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图向 data.table 中添加新列,其中行中的值取决于行中值的相对关系。更准确地说,如果一行中有一个值X,我想知道在同一列(和组)中有多少其他值在X-30内。

I'm trying to add new columns to data.table, where values in rows depend on the relative relationship of the values in the row. To be more precise, if there is a value X in a row, I would like to know how many other values are in the same column (and group), that are within X-30.

也就是说,给定:

DT<-data.table(
X = c(1, 2, 2, 1, 1, 2,  1, 2, 2, 1, 1, 1),
Y = c(100, 101, 133, 134, 150, 156,  190, 200, 201, 230, 233, 234),
Z = c(1, 2, 3, 4, 5, 6,  7, 8, 9, 10, 11, 12))

我想获得一个新列,其中包含值:

I would like to get a new column, with values:

N <- c(0, 0, 0, 0, 1, 1,  0, 0, 1, 0, 1, 2)

我试过下面的,但我没有得到我可以使用的结果:

I've tried the following, but I don't get the results I could use:

DT[,list(Y,num=cumsum(Y[-.I]>DT[.I,Y]-30),Z),by=.(X)]

任何想法如何做?

推荐答案

这可能是通过滚动连接(?)实现的,但现在 foverlaps b
$ b

This is probably can be achieved with a rolling join (?), but here is a foverlaps alternative for now

DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)] # Add row index and a -30 interval
setkey(DT, X, Y2, Y) # Sort by X and the intervals (for fovelaps)
res <- foverlaps(DT, DT)[Y2 > i.Y2, .N, keyby = indx] # Run foverlaps and check what can we catch
setorder(DT, indx) # go back to the original order
DT[res$indx, N := res$N][, c("indx", "Y2") := NULL] # update results and remove cols
DT
#     X   Y  Z N
#  1: 1 100  1 0
#  2: 2 101  2 0
#  3: 2 133  3 0
#  4: 1 134  4 0
#  5: 1 150  5 1
#  6: 2 156  6 1
#  7: 1 190  7 0
#  8: 2 200  8 0
#  9: 2 201  9 1
# 10: 1 230 10 0
# 11: 1 233 11 1
# 12: 1 234 12 2

使用 foverlaps which = TRUE 选项使重叠合并更小:

Alternately, use the which=TRUE option of foverlaps to make the overlap merge smaller:

# as above
DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)]
setkey(DT, X, Y2, Y)

# using which=TRUE:
res <- foverlaps(DT, DT, which=TRUE)[xid > yid, .N, by=xid]
DT[res$xid, N := res$N]
setorder(DT, indx)
DT[, c("Y2","indx") := NULL]

这篇关于计算每个组的data.table的窗口中的值的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆