如何进行数据表滚动联接? [英] How to do a data.table rolling join?

查看:104
本文介绍了如何进行数据表滚动联接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个要合并的数据表。一个是关于时间的公司市场价值数据,另一个是关于时间的公司股息历史记录。我试图找出每个公司每个季度支付了多少钱,并将该价值随时间推移移到了市场价值数据旁边。

I have two data tables that I'm trying to merge. One is data on company market values through time and the other is company dividend history through time. I'm trying to find out how much each company has paid each quarter and put that value next to the market value data through time.

library(magrittr)
library(data.table)
library(zoo)
library(lubridate)

set.seed(1337)
# data table of company  market values
companies <- 
    data.table(companyID = 1:10,
               Sedol = rep(c("91772E", "7A662B"), each = 5),
               Date = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
               MktCap = c(100 + cumsum(rnorm(5,5)),
                          50 + cumsum(rnorm(5,1,5)))) %>%
    setkey(Sedol, Date)

# data table of dividends
dividends <- 
    data.table(DivID = 1:7,
               Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
               Date = as.Date(c('2004-11-19', '2005-01-13', '2005-01-29',
                                '2005-10-01', '2005-06-29', '2005-06-30',
                                '2006-04-17')),
               DivAmnt = rnorm(7, .8, .3)) %>%
    setkey(Sedol, Date)

我相信这是您可以使用数据的情况。加入,类似:

I believe this is a situation where you could use a data.table rolling join, something like:

dividends[companies, roll = "nearest"]

尝试获取看起来像

       DivID  Sedol       Date   DivAmnt companyID    MktCap
    1:    NA 7A662B       <NA>        NA         6  61.21061
    2:     5 7A662B 2005-06-29 0.7772631         7  66.92951
    3:     6 7A662B 2005-06-30 1.1815343         7  66.92951
    4:    NA 7A662B       <NA>        NA         8  78.33914
    5:    NA 7A662B       <NA>        NA         9  88.92473
    6:    NA 7A662B       <NA>        NA        10  87.85067
    7:     2 91772E 2005-01-13 0.2964291         1 105.19249
    8:     3 91772E 2005-01-29 0.8472649         1 105.19249
    9:    NA 91772E       <NA>        NA         2 108.74579
   10:     4 91772E 2005-10-01 1.2467408         3 113.42261
   11:    NA 91772E       <NA>        NA         4 120.04491
   12:    NA 91772E       <NA>        NA         5 124.35588

(请注意,我将股息与公司市场价格的确切季度进行了匹配)

(note that I've matched the dividends to the company market values by the exact quarter)

但是我不确定如何执行它。如果 roll 是一个值,则CRAN pdf相当不清楚数字是多少或应该是什么(您可以传递日期吗?数字是否可以量化未来的携带天数?的变化?)并更改工资总额似乎并不能满足我的需求。

But I'm not exactly sure how to execute it. The CRAN pdf is rather vague about what the number is or should be if roll is a value (Can you pass dates? Does a number quantify the days forward to carry? the number of obersvations?) and changing rollends around doesn't seem to get me what I want.

最后,我最终将股息日期映射到其季度末,然后加入。一个很好的解决方案,但是如果我最终需要了解如何执行滚动联接,就没有用。在您的答案中,您能否描述滚动连接是唯一解决方案的情况,并帮助我理解如何执行它们?

In the end, I ended up mapping the dividend dates to their quarter end and then joining on that. A good solution, but not useful if I end up needing to know how to perform rolling joins. In your answer, could you describe a situation where rolling joins are the only solution as well as help me understand how to perform them?

推荐答案

您可能希望将重叠联接与 foverlaps 函数一起使用 post-tag title =显示标记了'data.table'的问题 rel = tag> data.table

Instead of a rolling join, you may want to use an overlap join with the foverlaps function of data.table:

# create an interval in the 'companies' datatable
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
# create a second date in the 'dividends' datatable
dividends[, Date2 := divDate]

# set the keys for the two datatable
setkey(companies, Sedol, start, end)
setkey(dividends, Sedol, divDate, Date2)

# create a vector of columnnames which can be removed afterwards
deletecols <- c("Date2","start","end")

# perform the overlap join and remove the helper columns
res <- foverlaps(companies, dividends)[, (deletecols) := NULL]

结果:


> res
     Sedol DivID    divDate   DivAmnt companyID   compDate    MktCap
 1: 7A662B    NA       <NA>        NA         6 2005-03-31  61.21061
 2: 7A662B     5 2005-06-29 0.7772631         7 2005-06-30  66.92951
 3: 7A662B     6 2005-06-30 1.1815343         7 2005-06-30  66.92951
 4: 7A662B    NA       <NA>        NA         8 2005-09-30  78.33914
 5: 7A662B    NA       <NA>        NA         9 2005-12-31  88.92473
 6: 7A662B    NA       <NA>        NA        10 2006-03-31  87.85067
 7: 91772E     2 2005-01-13 0.2964291         1 2005-03-31 105.19249
 8: 91772E     3 2005-01-29 0.8472649         1 2005-03-31 105.19249
 9: 91772E    NA       <NA>        NA         2 2005-06-30 108.74579
10: 91772E     4 2005-10-01 1.2467408         3 2005-09-30 113.42261
11: 91772E    NA       <NA>        NA         4 2005-12-31 120.04491
12: 91772E    NA       <NA>        NA         5 2006-03-31 124.35588







同时,数据.table 作者介绍了非等价联接( v1.9.8 )。您也可以使用它来解决此问题。使用非等额联接,您只需要:


In the meantime the data.table authors have introduced non-equi joins (v1.9.8). You can also use that to solve this problem. Using a non-equi join you just need:

companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
dividends[companies, on = .(Sedol, divDate >= start, divDate <= end)]

以获得预期的结果。

已使用数据(与问题中相同,但未创建密钥):

Used data (the same as in the question, but without the creation of the keys):

set.seed(1337)
companies <- data.table(companyID = 1:10, Sedol = rep(c("91772E", "7A662B"), each = 5),
                        compDate = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
                        MktCap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5))))
dividends <- data.table(DivID = 1:7, Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
                        divDate = as.Date(c('2004-11-19','2005-01-13','2005-01-29','2005-10-01','2005-06-29','2005-06-30','2006-04-17')),
                        DivAmnt = rnorm(7, .8, .3))

这篇关于如何进行数据表滚动联接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆