R data.table计数行,直到达到值 [英] R data.table count rows until value is reached
问题描述
我想在data.table中返回一个新列,它显示在到达低于当前值(Temp)的值之前向下多少行。
I would like to return a new column in a data.table which shows how many rows down until a value lower than the current value (of Temp) is reached.
library(data.table)
set.seed(123)
DT <- data.table( Temp = runif(10,0,20) )
这是我想要的样子:
set.seed(123)
DT <- data.table(
Temp = runif(10,0,20),
Day_Below_Temp = c("5","1","3","2","1","NA","3","1","1","NA")
)
推荐答案
在当前开发中使用新实现的非等值联接版本,这可以通过以下直接方式完成:
Using the newly implemented non-equi joins in the current development version, this can be accomplished in a straightforward manner as follows:
require(data.table) # v1.9.7+
DT[, row := .I] # add row numbers
DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first"]
# [1] 5 1 3 2 1 NA 3 1 1 NA
行号是必要的,因为我们需要找到低于当前索引的索引,因此需要是连接中的条件。我们执行自加入,即 DT
(inner)中的每一行,基于提供给
参数,我们在 DT
(outer)中找到第一个匹配的行索引。然后我们减去行索引以从当前行获取位置。 x.row
指外部 DT
和 i.row
到内部 DT
。
The row number is necessary since we need to find indices lower than the current index, hence needs to be a condition in the join. We perform a self-join, i.e., for each row in DT
(inner), based on condition provided to on
argument, we find the first matching row index in DT
(outer). Then we subtract the row indices to get the position from the current row. x.row
refers to the index of outer DT
and i.row
to the inner DT
.
要获取devel版本,请参阅安装说明此处。
To get the devel version, see installation instructions here.
在1e5行上:
set.seed(123)
DT <- data.table(Temp = runif(1e5L, 0L, 20L))
DT[, row := .I]
system.time({
ans = DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first", verbose=TRUE]
})
# Non-equi join operators detected ...
# forder took ... 0.001 secs
# Generating non-equi group ids ... done in 0.452 secs
# Recomputing forder with non-equi ids ... done in 0.001 secs
# Found 623 non-equi group(s) ...
# Starting bmerge ...done in 8.118 secs
# Detected that j uses these columns: x.row,i.row
# user system elapsed
# 8.492 0.038 8.577
head(ans)
# [1] 5 1 3 2 1 12
tail(ans)
# [1] 2 1 1 2 1 NA
这篇关于R data.table计数行,直到达到值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!