如何使用data.table执行连接超过日期范围? [英] How to perform join over date ranges using data.table?

查看:101
本文介绍了如何使用data.table执行连接超过日期范围?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用data.table执行以下操作(使用sqldf直接使用),并获得完全相同的结果:

  .table)

whatWasMeasured< - data.table(start = as.POSIXct(seq(1,1000,100),
origin =1970-01-01 00:00: 00),
end = as.POSIXct(seq(10,1000,100),origin =1970-01-01 00:00:00),
x = 1:10,
y = letters [1:10])

measurments< - data.table(time = as.POSIXct(seq(1,2000,1),
origin =1970- 01-01 00:00:00),
temp = runif(2000,10,100))

## data.tables的替换短名称
dt1< - whatWasMeasured
dt2< - measurments

##直接使用sqldf
库(sqldf)

sqldf(select * from measurments m,whatWasMeasured wwm
其中m.time在wwm.start和wwm.end之间)


方案

您可以使用 foverlaps()函数,有效地实现间隔连接。在您的情况下,我们只需要测量的虚拟列。


注意1:您应该安装data.table - v1.9.5 的开发版本作为 foverlaps code>已经修复。您可以在此处找到安装说明。



注意2:我将致电 whatWasMeasured = dt1 测量




<$

p $ p> require(data.table)## 1.9.5+
dt2 [,dummy:= time]

setkey(dt1,start,end)
ans = foverlaps(dt2,dt1,by.x = c(time,dummy),nomatch = 0L)[,dummy:= NULL]

有关详细信息和?foverlaps a / 25655497/559784>此帖,以便进行效果比较。


How to do the below (straightforward using sqldf) using data.table and get exact same result:

library(data.table)

whatWasMeasured <- data.table(start=as.POSIXct(seq(1, 1000, 100),
    origin="1970-01-01 00:00:00"),
    end=as.POSIXct(seq(10, 1000, 100), origin="1970-01-01 00:00:00"),
    x=1:10,
    y=letters[1:10])

measurments <- data.table(time=as.POSIXct(seq(1, 2000, 1),
    origin="1970-01-01 00:00:00"),
    temp=runif(2000, 10, 100))

## Alternative short names for data.tables
dt1 <- whatWasMeasured
dt2 <- measurments

## Straightforward with sqldf    
library(sqldf)

sqldf("select * from measurments m, whatWasMeasured wwm
where m.time between wwm.start and wwm.end")

解决方案

You can use the foverlaps() function which implements joins over intervals efficiently. In your case, we just need a dummy column for measurments.

Note 1: You should install the development version of data.table - v1.9.5 as a bug with foverlaps() has been fixed there. You can find the installation instructions here.

Note 2: I'll call whatWasMeasured = dt1 and measurments = dt2 here for convenience.

require(data.table) ## 1.9.5+
dt2[, dummy := time]

setkey(dt1, start, end)
ans = foverlaps(dt2, dt1, by.x=c("time", "dummy"), nomatch=0L)[, dummy := NULL]

See ?foverlaps for more info and this post for a performance comparison.

这篇关于如何使用data.table执行连接超过日期范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆