连接多个周期以获取时间序列,同时针对不同的起点 [英] concatenate periods to get time sequences, simultaneously for different starting points
问题描述
我有以下示例数据:
library(data.table)
set.seed(42)
t <- data.table(time=1:1000, period=round(runif(100,1,5)))
p <- data.table(id=1:10, cut=sample(1:100,5))
> t[62:71]
time period
1: 62 5
2: 63 4
3: 64 3
4: 65 4
5: 66 2
6: 67 2
7: 68 4
8: 69 4
9: 70 2
10: 71 1
> head(p)
id cut
1: 1 63
2: 2 22
3: 3 99
4: 4 38
5: 5 91
6: 6 63
其中 t
给出与时间点关联的期间
的向量,而 p
为每个人提供<$ c的界限$ c> time 。
where t
gives some vector of periods
associated with time points, and p
gives for each person a cutoff in time
.
对于 p
中的每个人,我想从此人的临界值开始,并通过连接期间
创建一个4个时间点的序列。例如,对于人1,从时间63开始,顺序为 63
, 63 + 4 = 67
, 67 + 2 = 69
和 69 + 4 = 73
。
For each person in p
, I would like to start at the person's cutoff and create a sequence of 4 time points by concatenating the periods
. For example, for person 1, starting at time 63, the sequence would be 63
, 63+4=67
, 67+2=69
and 69+4=73
.
理想情况下,输出将是:
Ideally, the output would then be:
> head(res)
id t1 t2 t3 t4
1 63 67 69 73
2 22 24 29 32
3 99 103 105 109
4 38 40 43 44
5 91 95 100 103
6 63 67 69 73
使用 accumulate :: purrr
(迭代总和,其中sum确定要添加的下一个位置。但是,我想知道是否可以使用 data.table
或其他软件包同时为不同的人同时执行这样的操作,但由于数据集很大,因此避免了for循环。
I learned before how to create the sequences using accumulate::purrr
(iterative cumsum where sum determines the next position to be added). However, I wonder whether something like this can be done simultaneously for different persons using data.table
or other packages but avoiding for-loops as the datasets are rather large.
编辑:时间值与行指标不一致的版本
library(data.table)
set.seed(42)
t <- data.table(time=1001:2000, period=round(runif(100,1,5)))
p <- data.table(id=1:10, cut=sample(1:100,5))
与上述类似,除了
> t[62:71]
time period
1: 1062 5
2: 1063 4
3: 1064 3
4: 1065 4
5: 1066 2
6: 1067 2
7: 1068 4
8: 1069 4
9: 1070 2
10: 1071 1
其中 t $ time [i]
不等于 i
,这禁止了Jaap的第一个解决方案。
where t$time[i]
does not equal i
, which prohibits Jaap's first solution.
推荐答案
For循环不一定很糟糕或效率很低。如果使用得当,它们可以有效解决您的问题。
For-loops aren't necessarily bad or inefficient. When used correctly, they can be an efficient solution for your problem.
对于您当前遇到的问题,我会在 data.table -package,因为 data.table
通过引用进行更新:
For your current problem I would use a for-loop with the data.table-package which is efficient because the data.table
is updated by reference:
res <- p[, .(id, t1 = cut)]
for(i in 2:4) {
res[, paste0("t",i) := t[res[[i]], time + period] ]
}
给出:
> res
id t1 t2 t3 t4
1: 1 63 67 69 73
2: 2 22 24 29 32
3: 3 99 103 105 109
4: 4 38 40 43 44
5: 5 91 95 100 103
6: 6 63 67 69 73
7: 7 22 24 29 32
8: 8 99 103 105 109
9: 9 38 40 43 44
10: 10 91 95 100 103
或者,您可以选择更新 p
如下:
for(i in 2:4) {
p[, paste0("t",i) := t[p[[i]], time + period]]
}
setnames(p, "cut", "t1")
给出相同的结果。
对于更新后的示例数据,应将上述方法更改为:
For the updated example data, you should change the above method to:
for(i in 2:4) {
p[, paste0("t",i) := t[match(p[[i]], t$time), time + period]]
}
setnames(p, "cut", "t1")
这篇关于连接多个周期以获取时间序列,同时针对不同的起点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!