通过重复连接创建序列 [英] create sequence by repeated joins

查看:77
本文介绍了通过重复连接创建序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下数据:

library(data.table)
set.seed(42)
t <- data.table(time=1001:2000, period=round(runif(1000,1,5)), a=round(rnorm(1000)))
p <- data.table(id=1:10, time=sample(1000:1100,5), a=round(rnorm(10)))


 > t[27:38]
    time period  a
 1: 1027      3 -1
 2: 1028      5 -1
 3: 1029      3  0
 4: 1030      4 -2
 5: 1031      4 -2
 6: 1032      4 -1
 7: 1033      3  0
 8: 1034      4  1
 9: 1035      1  0
10: 1036      4  0
11: 1037      1  0
12: 1038      2 -1

> head(p)
   id time  a
1:  1 1027  1
2:  2 1094  1
3:  3 1044 -1
4:  4 1053  1
5:  5 1015  1
6:  6 1027 -1

类似于我之前发布的数据连接时间段以获得时间序列,同时针对不同的起点,但现在具有从 t 结转的附加变量 a

which is similar to data I have posted before as concatenate periods to get time sequences, simultaneously for different starting points but now has the additional variable a that is carried over from t.

与我先前的问题相反,我的目标是通过串联n个周期,在 p 中创建序列在 t 中。对于 n = 4 ,理想的结果是这样

In contrast to my earlier question, my goal is to create sequences right into p by concatenating n of the periods in t. For n=4, the result would ideally look like this

> head(p)
   id time  a
1:  1 1027  1
2:  1 1030 -1 
3:  1 1034 -2
4:  1 1038  1
5:  1 1040 -1
6:  2 1094  1

因为id 1,从1027开始,顺序为 1027 1027 + 3 = 1030 1030 + 4 = 1034 1034 + 4 = 1038 1038 + 2 = 1040 ,增量取自 t 。另外,随身携带 t $ a 来填写 p $ a

because for id 1, starting at 1027, the sequence is 1027, 1027+3=1030, 1030+4=1034, 1034+4=1038 and 1038+2=1040, where the increments are taken from t. In addition, t$a is "taken along" to fill in for p$a.

在我之前的问题中,Jaap提供了一种出色的解决方案,以获得每个 id 。我想知道是否可以直接在 p 中实现。也许可以使用 t p 的连接重复进行,或者有一个更有效的解决方案(因为效率是键)。

In my earlier question, Jaap has given a fantastic solution to obtain a two-dimensional output with one line per id. I wonder whether this can be achieved directly in p. Perhaps this can be done using joins of t into p repeatedly or perhaps there is a more efficient solution (because efficiency is key here).

推荐答案

我不确定您要对 a 来带走,
,但也许此递归可以满足您的要求,
,尽管我不知道它是否足够有效:

I'm not 100% sure about what you want to do with a to "take it along", but maybe this recursion does what you want, although I don't know if it's efficient enough:

create_sequences <- function(p, n, acc = p) {
  if (n == 0L) return(setkey(acc, id, time))

  next_p <- t[p, .(id, time = time + period, a = x.a), on = "time"]

  create_sequences(next_p, n - 1L, rbindlist(list(acc, next_p)))
}

ans <- create_sequences(p, 4L)

这篇关于通过重复连接创建序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆