R data.table计数行,直到达到值 [英] R data.table count rows until value is reached

查看:200
本文介绍了R data.table计数行,直到达到值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在data.table中返回一个新列,它显示在到达低于当前值(Temp)的值之前向下多少行。

I would like to return a new column in a data.table which shows how many rows down until a value lower than the current value (of Temp) is reached.

library(data.table)
set.seed(123)
DT <- data.table( Temp = runif(10,0,20) )

这是我想要的样子:

set.seed(123)
DT <- data.table(
        Temp = runif(10,0,20),
        Day_Below_Temp = c("5","1","3","2","1","NA","3","1","1","NA")
)


推荐答案

在当前开发中使用新实现的非等值联接版本,这可以通过以下直接方式完成:

Using the newly implemented non-equi joins in the current development version, this can be accomplished in a straightforward manner as follows:

require(data.table) # v1.9.7+
DT[, row := .I] # add row numbers
DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first"]
# [1]  5  1  3  2  1 NA  3  1  1 NA

行号是必要的,因为我们需要找到低于当前索引的索引,因此需要是连接中的条件。我们执行自加入,即 DT (inner)中的每一行,基于提供给 参数,我们在 DT (outer)中找到第一个匹配的行索引。然后我们减去行索引以从当前行获取位置。 x.row 指外部 DT i.row 到内部 DT

The row number is necessary since we need to find indices lower than the current index, hence needs to be a condition in the join. We perform a self-join, i.e., for each row in DT (inner), based on condition provided to on argument, we find the first matching row index in DT (outer). Then we subtract the row indices to get the position from the current row. x.row refers to the index of outer DT and i.row to the inner DT.

要获取devel版本,请参阅安装说明此处

To get the devel version, see installation instructions here.

在1e5行上:

set.seed(123)
DT <- data.table(Temp = runif(1e5L, 0L, 20L))

DT[, row := .I]
system.time({
    ans = DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first", verbose=TRUE]
})
# Non-equi join operators detected ... 
#   forder took ... 0.001 secs
#   Generating non-equi group ids ... done in 0.452 secs
#   Recomputing forder with non-equi ids ... done in 0.001 secs
#   Found 623 non-equi group(s) ...
# Starting bmerge ...done in 8.118 secs
# Detected that j uses these columns: x.row,i.row 
#    user  system elapsed 
#   8.492   0.038   8.577 

head(ans)
# [1]  5  1  3  2  1 12
tail(ans)
# [1]  2  1  1  2  1 NA

这篇关于R data.table计数行,直到达到值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆