R data.table查找当前行到上一行之间的滞后 [英] R data.table find lags between current row to previous row

查看:97
本文介绍了R data.table查找当前行到上一行之间的滞后的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

> tempDT <- data.table(colA = c("E","E","A","A","E","A","E")
+                      , lags = c(NA,1,1,2,3,1,2))
> tempDT
   colA lags
1:    E   NA
2:    E    1
3:    A    1
4:    A    2
5:    E    3
6:    A    1
7:    E    2

我有列colA,并且需要查找当前行和上一行colA == "E"之间的滞后.

I have column colA, and need to find lags between current row and the previous row whose colA == "E".

注意:如果我们可以找到colA == "E"的前一行的行引用,则可以计算滞后.但是,我不知道如何实现.

Note: if we could find the row reference for the previous row whose colA == "E", then we could calculate the lags. However, I don't know how to achieve it.

推荐答案

1)定义lastEpos,其中给定的i返回最后一个E在第一个i中的位置行,并将其应用于每个行号:

1) Define lastEpos which given i returns the position of the last E among the first i rows and apply that to each row number:

lastEpos <- function(i) tail(which(tempDT$colA[1:i] == "E"), 1)
tempDT[, lags := .I - shift(sapply(.I, lastEpos))]

这里有一些变化形式:

2)i-1 在此变体中,lastEpos返回前i-1行中最后一个E的位置,而不是i:

2) i-1 In this variation lastEpos returns the positions of the last E among the first i-1 rows rather than i:

lastEpos <- function(i) tail(c(NA, which(tempDT$colA[seq_len(i-1)] == "E")), 1)
tempDT[, lags := .I - sapply(.I, lastEpos)]

3)位置与(2)相似,但使用Position:

3) Position Similar to (2) but uses Position:

lastEpos <- function(i) Position(c, tempDT$colA[seq_len(i-1)] == "E", right = TRUE)
tempDT[, lags := .I - sapply(.I, lastEpos)]

4)滚动应用

library(zoo)
w <- lapply(1:nrow(tempDT), function(i) -rev(seq_len(i-1)))
tempDT[, lags := .I - rollapply(colA == "E", w, Position, f = c, right = TRUE)]

5)sqldf

library(sqldf)

sqldf("select a.colA, a.rowid - b.rowid lags
       from tempDT a left join tempDT b
       on b.rowid < a.rowid and b.colA = 'E'
       group by a.rowid")

这篇关于R data.table查找当前行到上一行之间的滞后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆