通过重复连接创建序列 [英] create sequence by repeated joins
问题描述
假设我有以下数据:
library(data.table)
set.seed(42)
t <- data.table(time=1001:2000, period=round(runif(1000,1,5)), a=round(rnorm(1000)))
p <- data.table(id=1:10, time=sample(1000:1100,5), a=round(rnorm(10)))
> t[27:38]
time period a
1: 1027 3 -1
2: 1028 5 -1
3: 1029 3 0
4: 1030 4 -2
5: 1031 4 -2
6: 1032 4 -1
7: 1033 3 0
8: 1034 4 1
9: 1035 1 0
10: 1036 4 0
11: 1037 1 0
12: 1038 2 -1
> head(p)
id time a
1: 1 1027 1
2: 2 1094 1
3: 3 1044 -1
4: 4 1053 1
5: 5 1015 1
6: 6 1027 -1
类似于我之前发布的数据连接时间段以获得时间序列,同时针对不同的起点,但现在具有从 t
结转的附加变量 a
。
which is similar to data I have posted before as concatenate periods to get time sequences, simultaneously for different starting points but now has the additional variable a
that is carried over from t
.
与我先前的问题相反,我的目标是通过串联n个周期,在 p
中创建序列在 t
中。对于 n = 4
,理想的结果是这样
In contrast to my earlier question, my goal is to create sequences right into p
by concatenating n of the periods in t
. For n=4
, the result would ideally look like this
> head(p)
id time a
1: 1 1027 1
2: 1 1030 -1
3: 1 1034 -2
4: 1 1038 1
5: 1 1040 -1
6: 2 1094 1
因为id 1,从1027开始,顺序为 1027
, 1027 + 3 = 1030
, 1030 + 4 = 1034
, 1034 + 4 = 1038
和 1038 + 2 = 1040
,增量取自 t
。另外,随身携带 t $ a
来填写 p $ a
。
because for id 1, starting at 1027, the sequence is 1027
, 1027+3=1030
, 1030+4=1034
, 1034+4=1038
and 1038+2=1040
, where the increments are taken from t
. In addition, t$a
is "taken along" to fill in for p$a
.
在我之前的问题中,Jaap提供了一种出色的解决方案,以获得每个 id
。我想知道是否可以直接在 p
中实现。也许可以使用 t
到 p
的连接重复进行,或者有一个更有效的解决方案(因为效率是键)。
In my earlier question, Jaap has given a fantastic solution to obtain a two-dimensional output with one line per id
. I wonder whether this can be achieved directly in p
. Perhaps this can be done using joins of t
into p
repeatedly or perhaps there is a more efficient solution (because efficiency is key here).
推荐答案
我不确定您要对 a $ c做什么$ c>来带走,
,但也许此递归可以满足您的要求,
,尽管我不知道它是否足够有效:
I'm not 100% sure about what you want to do with a
to "take it along",
but maybe this recursion does what you want,
although I don't know if it's efficient enough:
create_sequences <- function(p, n, acc = p) {
if (n == 0L) return(setkey(acc, id, time))
next_p <- t[p, .(id, time = time + period, a = x.a), on = "time"]
create_sequences(next_p, n - 1L, rbindlist(list(acc, next_p)))
}
ans <- create_sequences(p, 4L)
这篇关于通过重复连接创建序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!