子集时间序列，以便所选行相差某个最小时间 [英] Subset time series so that selected rows differs by a certain minimum time

查看：10 发布时间：2022/1/11 9:33:19 r indexing data.table time-series

本文介绍了子集时间序列，以便所选行相差某个最小时间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 R 中使用 data.table 来存储时间序列.我想返回一个子集，以使所选时间的连续行与所选的最后一行相距至少 N 秒，例如如果我有

I'm using a data.table in R to store a time series. I want to return a subset such that successive rows for the selected times are at least N seconds apart from the last row that was selected, e.g. if I have

library(data.table)
x <- data.table(t=c(0,1,3,4,5,6,7,10,16,17,18,20,21), v=1:13)
x
     t  v
 1:  0  1
 2:  1  2
 3:  3  3
 4:  4  4
 5:  5  5
 6:  6  6
 7:  7  7
 8: 10  8
 9: 16  9
10: 17 10
11: 18 11
12: 20 12
13: 21 13

我想从第一行开始采样至少相隔 5 秒的行，然后我应该得到一个带有时间/值对的 data.table:

and I want to sample rows that are at least 5 seconds apart, starting from the first row, then I should get a data.table with time/value pairs:

y <- x[...something...]
y
     t  v
 1:  0  1
 2:  5  5
 3: 10  8
 4: 16  9
 5: 21 13

时间样本也不必定期间隔，所以我不能只取每 M 行.当然，我可以通过手动循环遍历 data.table 行来做到这一点，但我想知道是否有更方便的方法来使用 data.tables 索引来表达这一点.

The time samples don't have to be regularly spaced either, so I can't just take every M rows. Of course I could do this by looping through the data.table rows manually but I'm wondering if there's a more convenient way to express this using data.tables indexing.

推荐答案

以下是几种使用滚动联接在子集中查找行集 w 的方法:

Here are a couple ways to use rolling joins to find the set of rows, w, in your subset:

t_plus = 5

# one join per row visited
w   <- c()
nxt <- 1L
while(!is.na(nxt)){ 
  w   <- c(w, nxt) 
  nxt <- x[.(t[nxt]+t_plus), on=.(t), roll=-Inf, which=TRUE]
}

# join once on all rows
w0  <- x[.(t+5), on=.(t), roll=-Inf, which=TRUE]

w   <- c()
nxt <- 1L
while (!is.na(nxt)){ 
  w   <- c(w, nxt)
  nxt <- w0[nxt] 
}

然后你可以像 x[w] 这样子集.

Then you can subset like x[w].

原则上，可能存在满足 OP 条件至少相隔 5 秒"的其他子集；这只是通过从第一行向前迭代找到的.

In principle, there could be other subsets that satisfy the OP's condition "at least 5 seconds apart"; this is just the one found by iterating from the first row forward.

第二种方法基于 @DavidArenburg 对上面链接的问答 Henrik 的回答.尽管问题似乎相同，但我无法让这种方法在这里完全发挥作用.

The second way is based on @DavidArenburg's answer to the Q&A Henrik linked above. Although the question seems the same, I couldn't get that approach to work fully here.

一般来说，在 R 中循环增长东西是个坏主意(就像我在这里使用 w 所做的那样).如果您遇到性能问题，这可能是在此代码中改进的好地方.

Generally, it's a bad idea to grow things in a loop in R (like I'm doing with w here). If you're running into performance problems, that might be a good area to improve in this code.

这篇关于子集时间序列，以便所选行相差某个最小时间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

子集时间序列，以便所选行相差某个最小时间 [英] Subset time series so that selected rows differs by a certain minimum time

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

子集时间序列，以便所选行相差某个最小时间 [英] Subset time series so that selected rows differs by a certain minimum time

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭