子集时间序列,以便所选行相差某个最小时间 [英] Subset time series so that selected rows differs by a certain minimum time
问题描述
我在 R 中使用 data.table 来存储时间序列.我想返回一个子集,以使所选时间的连续行与所选的最后一行相距至少 N 秒,例如如果我有
I'm using a data.table in R to store a time series. I want to return a subset such that successive rows for the selected times are at least N seconds apart from the last row that was selected, e.g. if I have
library(data.table)
x <- data.table(t=c(0,1,3,4,5,6,7,10,16,17,18,20,21), v=1:13)
x
t v
1: 0 1
2: 1 2
3: 3 3
4: 4 4
5: 5 5
6: 6 6
7: 7 7
8: 10 8
9: 16 9
10: 17 10
11: 18 11
12: 20 12
13: 21 13
我想从第一行开始采样至少相隔 5 秒的行,然后我应该得到一个带有时间/值对的 data.table:
and I want to sample rows that are at least 5 seconds apart, starting from the first row, then I should get a data.table with time/value pairs:
y <- x[...something...]
y
t v
1: 0 1
2: 5 5
3: 10 8
4: 16 9
5: 21 13
时间样本也不必定期间隔,所以我不能只取每 M 行.当然,我可以通过手动循环遍历 data.table 行来做到这一点,但我想知道是否有更方便的方法来使用 data.tables 索引来表达这一点.
The time samples don't have to be regularly spaced either, so I can't just take every M rows. Of course I could do this by looping through the data.table rows manually but I'm wondering if there's a more convenient way to express this using data.tables indexing.
推荐答案
以下是几种使用滚动联接在子集中查找行集 w
的方法:
Here are a couple ways to use rolling joins to find the set of rows, w
, in your subset:
t_plus = 5
# one join per row visited
w <- c()
nxt <- 1L
while(!is.na(nxt)){
w <- c(w, nxt)
nxt <- x[.(t[nxt]+t_plus), on=.(t), roll=-Inf, which=TRUE]
}
# join once on all rows
w0 <- x[.(t+5), on=.(t), roll=-Inf, which=TRUE]
w <- c()
nxt <- 1L
while (!is.na(nxt)){
w <- c(w, nxt)
nxt <- w0[nxt]
}
然后你可以像 x[w]
这样子集.
Then you can subset like x[w]
.
评论
原则上,可能存在满足 OP 条件至少相隔 5 秒"的其他子集;这只是通过从第一行向前迭代找到的.
In principle, there could be other subsets that satisfy the OP's condition "at least 5 seconds apart"; this is just the one found by iterating from the first row forward.
第二种方法基于 @DavidArenburg 对上面链接的问答 Henrik 的回答.尽管问题似乎相同,但我无法让这种方法在这里完全发挥作用.
The second way is based on @DavidArenburg's answer to the Q&A Henrik linked above. Although the question seems the same, I couldn't get that approach to work fully here.
一般来说,在 R 中循环增长东西是个坏主意(就像我在这里使用 w
所做的那样).如果您遇到性能问题,这可能是在此代码中改进的好地方.
Generally, it's a bad idea to grow things in a loop in R (like I'm doing with w
here). If you're running into performance problems, that might be a good area to improve in this code.
这篇关于子集时间序列,以便所选行相差某个最小时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!