更高效的滚动联接向后不向前? [英] More efficient rolling join backwards not forwards?

查看:90
本文介绍了更高效的滚动联接向后不向前?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

data.table implements asof (也称为 rolling LOCF )连接开箱即可。我发现这个相关问题:

data.table implements asof (also know as rolling or LOCF) joins out of the box. I've found this related question :

在数据表中填入缺失(空白) - 每个类别 - 向后和向前

但该问题在数据中有NA。在我的情况下,我遵循建议,以保持数据不规则,并使用 roll = TRUE 加入它。我想要做的,而不是最后的观察结果,是下一个观察,将尽可能有效地落后。

but that question has NAs in the data. In my case I'm following the advice there to keep the data irregular and join to it using roll=TRUE. What I'd like to do instead of the last observation carried forward, is the next observation to be carried backward, as efficiently as possible.

这是我尝试,使用 time:= - time 首先尝试和欺骗。我可以做得更好吗?我可以更快吗?

This is what I've tried, using time:=-time first to try and trick it. Can I do it better? Can I do it faster?

llorJoin <- function(A,B){
    B <- copy(B);
    keys <- key(A);
    if( !identical(key(A), key(B)) | is.null(keys) ){
       stop("llorJoin::ERROR; A and B should have the same non-empty keys");
    }

    lastKey <- tail(keys,1L);
    myStr <- parse(text=paste0(lastKey,":=-as.numeric(",lastKey,")"));
    A <- A[,eval(myStr)]; setkeyv(A,keys);
    B <- B[,eval(myStr)]; setkeyv(B,keys);

    origin <- "1970-01-01 00:00.00 UTC";
    A <- B[A,roll=T];
    myStr2 <- parse(text=paste0(lastKey,":=as.POSIXct(-",lastKey,",origin=origin)"));
    A <- A[,eval(myStr2)]; setkeyv(A,keys);
    return(A);
}







library(data.table)
A <- data.table(time=as.POSIXct(c("10:01:01","10:01:02","10:01:04","10:01:05","10:01:02","10:01:01","10:01:01"),format="%H:%M:%S"),
                b=c("a","a","a","a","b","c","c"),
                d=c(1,1.9,2,1.8,5,4.1,4.2));
B <- data.table(time=as.POSIXct(c("10:01:01","10:01:03","10:01:00","10:01:01"),format="%H:%M:%S"),b=c("a","a","c","d"), e=c(1L,2L,3L,4L));
setkey(A,b,time)
setkey(B,b,time)







library(rbenchmark)
benchmark(llorJoin(A,B),B[A,roll=T],replications=10)
            test replications elapsed relative user.self sys.self user.child sys.child
1 llorJoin(A, B)           10   0.045        1     0.048        0          0         0
2 B[A, roll = T]           10   0.009        1     0.008        0          0         0

   b                time  e   d
1: a 2013-01-12 09:01:01  1 1.0
2: a 2013-01-12 09:01:02  2 1.9
3: a 2013-01-12 09:01:04 NA 2.0
4: a 2013-01-12 09:01:05 NA 1.8
5: b 2013-01-12 09:01:02 NA 5.0
6: c 2013-01-12 09:01:01 NA 4.1
7: c 2013-01-12 09:01:01 NA 4.2

作为比较,asof join对初始数据的速度快了5倍。 p>

So as a comparaison, asof join on the initial data is 5X faster.

推荐答案

roll 参数可以执行 nocb 很久以前。更新此答案,以便#615 可以关闭。

roll argument can perform nocb since a long time ago. Updating this answer so that #615 can be closed.

您不需要再设置键。相反,您可以使用 on = 参数(在 v1.9.6 中实现)指定要加入的列。有了这两个特性,任务可以完成如下:

You don't need to set keys anymore as well. Instead you can specify the columns to join on using on= argument (implemented in v1.9.6). With these two features, the task can be accomplished as follows:

require(data.table) # v1.9.6+
A[B, on=c("b", "time"), roll=-Inf]
#                   time b  e   d
# 1: 2015-10-11 10:01:01 a  1 1.0
# 2: 2015-10-11 10:01:02 a  2 1.9
# 3: 2015-10-11 10:01:04 a NA 2.0
# 4: 2015-10-11 10:01:05 a NA 1.8
# 5: 2015-10-11 10:01:02 b NA 5.0
# 6: 2015-10-11 10:01:01 c NA 4.1
# 7: 2015-10-11 10:01:01 c NA 4.2

就是这样。

你已经非常接近最快的方式,而不改变数据。表。以下功能要求已在前一段时间提交:

You're pretty close to the fastest way without a change to data.table. The following feature request has been filed some time ago :

FR#2300向后添加并首次滚动到roll = TRUE

我添加了一个指向此问题的链接。您可以在R-Forge上搜索功能请求列表。在这种情况下,像滚动,向前和向后的词都找到它。您可能需要4或5次尝试搜索尝试,以确认错误或功能请求尚未提交。

I've added a link there back to this question. You can search the feature request list on R-Forge. In this case words like "roll", "forwards" and "backwards" all find it. You might need 4 or 5 attempts search attempts to confirm the bug or feature request isn't already filed.

这可能更快地实现该功能请求内部需要很少的行),而不是尝试,并为您提供最快的解决方法。

It's probably quicker for me to implement that feature request (only a few lines internally are needed) than try and provide you the quickest possible workaround.

这篇关于更高效的滚动联接向后不向前?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆