如何使用data.table执行连接超过日期范围? [英] How to perform join over date ranges using data.table?
问题描述
如何使用data.table执行以下操作(使用sqldf直接使用),并获得完全相同的结果:
.table)
whatWasMeasured< - data.table(start = as.POSIXct(seq(1,1000,100),
origin =1970-01-01 00:00: 00),
end = as.POSIXct(seq(10,1000,100),origin =1970-01-01 00:00:00),
x = 1:10,
y = letters [1:10])
measurments< - data.table(time = as.POSIXct(seq(1,2000,1),
origin =1970- 01-01 00:00:00),
temp = runif(2000,10,100))
## data.tables的替换短名称
dt1< - whatWasMeasured
dt2< - measurments
##直接使用sqldf
库(sqldf)
sqldf(select * from measurments m,whatWasMeasured wwm
其中m.time在wwm.start和wwm.end之间)
您可以使用 foverlaps()
函数,有效地实现间隔连接。在您的情况下,我们只需要测量
的虚拟列。
注意1:您应该安装data.table -
v1.9.5
的开发版本作为foverlaps code>已经修复。您可以在此处找到安装说明。
注意2:我将致电
whatWasMeasured
=dt1
和测量
<$
require(data.table)## 1.9.5+
dt2 [,dummy:= time]
setkey(dt1,start,end)
ans = foverlaps(dt2,dt1,by.x = c(time,dummy),nomatch = 0L)[,dummy:= NULL]
有关详细信息和?foverlaps a / 25655497/559784>此帖,以便进行效果比较。
How to do the below (straightforward using sqldf) using data.table and get exact same result:
library(data.table)
whatWasMeasured <- data.table(start=as.POSIXct(seq(1, 1000, 100),
origin="1970-01-01 00:00:00"),
end=as.POSIXct(seq(10, 1000, 100), origin="1970-01-01 00:00:00"),
x=1:10,
y=letters[1:10])
measurments <- data.table(time=as.POSIXct(seq(1, 2000, 1),
origin="1970-01-01 00:00:00"),
temp=runif(2000, 10, 100))
## Alternative short names for data.tables
dt1 <- whatWasMeasured
dt2 <- measurments
## Straightforward with sqldf
library(sqldf)
sqldf("select * from measurments m, whatWasMeasured wwm
where m.time between wwm.start and wwm.end")
You can use the foverlaps()
function which implements joins over intervals efficiently. In your case, we just need a dummy column for measurments
.
Note 1: You should install the development version of data.table -
v1.9.5
as a bug withfoverlaps()
has been fixed there. You can find the installation instructions here.Note 2: I'll call
whatWasMeasured
=dt1
andmeasurments
=dt2
here for convenience.
require(data.table) ## 1.9.5+
dt2[, dummy := time]
setkey(dt1, start, end)
ans = foverlaps(dt2, dt1, by.x=c("time", "dummy"), nomatch=0L)[, dummy := NULL]
See ?foverlaps
for more info and this post for a performance comparison.
这篇关于如何使用data.table执行连接超过日期范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!