data.table滚动连接范围内 [英] data.table rolling join within range

查看:90
本文介绍了data.table滚动连接范围内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用R 3.2.2,data.table 1.9.5

Using R 3.2.2, data.table 1.9.5

价格数据每隔一段时间有效,如下所示。请注意,范围可能不连续,如记录3和4所示。

Price data is available in intervals as shown below. Please note that the ranges may not be contiguous as shown in records 3 and 4.

   id price    startdt      enddt
1: aa   259 2005-01-01 2005-06-30
2: aa   259 2005-07-01 2005-12-31
3: aa   249 2006-01-01 2006-06-30
4: aa   239 2007-01-01 2007-06-30

销售数据

   id         dt sls
1: aa 2005-03-01 250
2: aa 2005-08-15 240
3: aa 2005-12-31 300
4: aa 2006-08-01 100
5: aa 2007-04-01 400

我想创建下表。

   id         dt sls price
1: aa 2005-03-01 250   259
2: aa 2005-08-15 240   259
3: aa 2005-12-31 300   259
4: aa 2006-08-01 100    NA   <== since there isn't a price in the price table for this date
5: aa 2007-04-01 400   239

有人可以告诉你怎么做?我假设我需要熔化价格表,使日期在1列,然后与销售数据滚动连接。

Can somebody please show how to do this? I am assuming that I will need to melt the price table so that dates are in 1 column and then do a rolling join with sales data. The full table has 25million+ records for ~700k skus.

price.dt <- data.table(id=rep("aa",4),price=c(259,259,249,239),
    startdt=as.Date(c("2005-1-1","2005-7-1","2006-1-1","2007-1-1"),"%Y-%m-%d"),
    enddt=as.Date(c("2005-6-30","2005-12-31","2006-6-30","2007-6-30"),"%Y-%m-%d"))

sales.dt <- data.table(id=rep("aa",5),
    dt=as.Date(c("2005-3-1","2005-8-15","2005-12-31","2006-8-1","2007-4-1"),"%Y-%m-%d"),
    sls=c(250,240,300,100,400))

final.dt <- data.table(id=rep("aa",5),
    dt=as.Date(c("2005-3-1","2005-8-15","2005-12-31","2006-8-1","2007-4-1"),"%Y-%m-%d"),
    sls=c(250,240,300,100,400), price=c(259,259,259,NA,239))


推荐答案

price.dt[sales.dt, on = c(id = 'id', startdt = 'dt'), roll = T
       ][enddt < startdt, price := NA][]
#   id price    startdt      enddt sls
#1: aa   259 2005-03-01 2005-06-30 250
#2: aa   259 2005-08-15 2005-12-31 240
#3: aa   259 2005-12-31 2005-12-31 300
#4: aa    NA 2006-08-01 2006-06-30 100
#5: aa   239 2007-04-01 2007-06-30 400

然后删除额外的列,重命名 startdt 等。另一个选项是使用 foverlaps

You can then remove extra columns, rename startdt and so on. Another option is using foverlaps.

这篇关于data.table滚动连接范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆