滚动连接:向前和向后滚动 [英] Rolling joins: roll forwards and backwards
问题描述
data.table
是真棒,因为我可以滚动连接,甚至在组内滚动连接!
data.table
is awesome, because I can do rolling joins, and even do rolling joins within groups!
library(data.table)
set.seed(42)
metrics <- data.frame(
ID=c(rep(1, 10), rep(2,5), rep(3,5)),
Time=c(1:10, 4:8, 8:12),
val1=runif(20),
val2=runif(20),
val3=runif(20),
val4=runif(20)
)
metrics <- data.table(metrics[sample(1:nrow(metrics), 15),], key=c('ID', 'Time'))
calendar <- data.table(expand.grid(ID=1:3, Time=1:12), key=c('ID', 'Time'))
metrics[calendar,roll=TRUE]
但是,这对我来说不够棒。此 data.table
仍有NAs:
However, this isn't awesome enough for me. This data.table
still has NAs:
> metrics[calendar,roll=TRUE]
ID Time val1 val2 val3 val4
1: 1 1 0.9148060 0.9040314 0.3795592 0.675607275
2: 1 2 0.9370754 0.1387102 0.4357716 0.982817198
3: 1 3 0.9370754 0.1387102 0.4357716 0.982817198
4: 1 4 0.8304476 0.9466682 0.9735399 0.566488424
5: 1 5 0.8304476 0.9466682 0.9735399 0.566488424
6: 1 6 0.5190959 0.5142118 0.9575766 0.189473935
7: 1 7 0.7365883 0.3902035 0.8877549 0.271286615
8: 1 8 0.7365883 0.3902035 0.8877549 0.271286615
9: 1 9 0.6569923 0.4469696 0.9709666 0.693204820
10: 1 10 0.7050648 0.8360043 0.6188382 0.240544740
11: 1 11 0.7050648 0.8360043 0.6188382 0.240544740
12: 1 12 0.7050648 0.8360043 0.6188382 0.240544740
13: 2 1 NA NA NA NA
14: 2 2 NA NA NA NA
15: 2 3 NA NA NA NA
16: 2 4 0.4577418 0.7375956 0.3334272 0.042988796
17: 2 5 0.7191123 0.8110551 0.3467482 0.140479094
18: 2 6 0.9346722 0.3881083 0.3984854 0.216385415
19: 2 7 0.2554288 0.6851697 0.7846928 0.479398564
20: 2 8 0.2554288 0.6851697 0.7846928 0.479398564
21: 2 9 0.2554288 0.6851697 0.7846928 0.479398564
22: 2 10 0.2554288 0.6851697 0.7846928 0.479398564
23: 2 11 0.2554288 0.6851697 0.7846928 0.479398564
24: 2 12 0.2554288 0.6851697 0.7846928 0.479398564
25: 3 1 NA NA NA NA
26: 3 2 NA NA NA NA
27: 3 3 NA NA NA NA
28: 3 4 NA NA NA NA
29: 3 5 NA NA NA NA
30: 3 6 NA NA NA NA
31: 3 7 NA NA NA NA
32: 3 8 0.9400145 0.8329161 0.7487954 0.719355838
33: 3 9 0.9400145 0.8329161 0.7487954 0.719355838
34: 3 10 0.1174874 0.2076590 0.1712643 0.375489965
35: 3 11 0.4749971 0.9066014 0.2610880 0.514407708
36: 3 12 0.5603327 0.6117786 0.5144129 0.001570554
ID Time val1 val2 val3 val4
我可以使用 zoo ::: na.locf
, fromLast = TRUE
,但这不是很有趣。在 data.table
加入期间,任何人都可以想到一种优雅的方式,我可以滚动NA的向后 >
I could fill these NA's using zoo:::na.locf
, fromLast=TRUE
, but that's not very fun. Can anyone think of an elegant way I can roll NA's backward, (after rolling them forward), during the data.table
join?
推荐答案
这可以在2013年3月发布的 data.table 版本1.8.8中实现:
This is possible in data.table version 1.8.8 released March 2013:
metrics[calendar, roll=TRUE, rollends=c(TRUE, TRUE)]
除了TRUE / FALSE,现在是正数(前滚/ LOCF)或
负数(后滚/ NOCB)。有限数字限制了值
滚动的距离(有限的陈旧度)。 roll = TRUE和roll = + Inf是等效的。
'rollends'是一个保存两个逻辑的新参数。如果rollends的第一个值为TRUE,则第一个观察向后滚动
。如果rollends
的第二个值为TRUE,则最后一个观察结果是向前滚动。如果roll是有限数字,则相同的限制适用于末端。
新值roll ='nearest'连接到最近的值(向后或向前),当
该值落在间隙中,并根据'rollends'到结束值。
'rolltolast'已被弃用。为了向后兼容,它转换为
{roll = TRUE; rollends = c(FALSE,FALSE)}。
In addition to TRUE/FALSE, 'roll' may now be a positive number (roll forwards/LOCF) or negative number (roll backwards/NOCB). A finite number limits the distance a value is rolled (limited staleness). roll=TRUE and roll=+Inf are equivalent. 'rollends' is a new parameter holding two logicals. The first observation is rolled backwards if the first value of rollends is TRUE. The last observation is rolled forwards if the second value of rollends is TRUE. If roll is a finite number, the same limit applies to the ends. New value roll='nearest' joins to the nearest value (either backwards or forwards) when the value falls in a gap, and to the end value according to 'rollends'. 'rolltolast' has been deprecated. For backwards compatibility it is converted to {roll=TRUE;rollends=c(FALSE,FALSE)}.
始终要下载 data.table 的最新版本,请参阅安装。
As always, to download the most up-to-date version of data.table, see Installation.
这篇关于滚动连接:向前和向后滚动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!