滚动连接数据表 [英] rolling joins data.table in R

查看:123
本文介绍了滚动连接数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解更多关于滚动连接工作的方式,我有一些混乱,我希望有人能为我澄清这一点。具体例子如下:

I am trying to understand a little more about the way rolling joins work and am having some confusion, I was hoping somebody could clarify this for me. To take a concrete example:

dt1 <- data.table(id=rep(1:5, 10), t=1:50, val1=1:50, key="id,t")
dt2 <- data.table(id=rep(1:5, 2), t=1:10, val2=1:10, key="id,t")



我希望这会产生一个长的 data.table 其中滚动 dt2 中的值:

dt1[dt2,roll=TRUE]

这样做似乎是:

dt2[dt1,roll=TRUE]

有人可以向我解释一下如何加入 data.table 不能正确理解。我认为 dt1 [dt2,roll = TRUE] 对应于 select * from dt1 right join dt2 on(dt1.id = dt2

Could someone explain to me more about how joining in data.table works as I am clearly not understanding it correctly. I thought that dt1[dt2,roll=TRUE] corresponded to the sql equivalent of select * from dt1 right join dt2 on (dt1.id = dt2.id and dt1.t = dt2.t), except with the added functionality locf.

此外,文档说:

X[Y] is a join, looking up X's rows using Y (or Y's key if it has one) 
as an index.

这使得看起来只有X中的东西应该返回一个正在进行的连接是一个内连接,而不是外部。在 roll = T 但特殊 id 不存在于 dt1

This makes it seem that only things in X should be returned an the join being done is an inner join, not outer. What about in the case when roll=T but that particular id does not exist in dt1? Playing around a bit more I can't understand what value is being placed into the column.

推荐答案

从文档中引用的引用来自FAQ 1.12 X [Y]和合并(X,Y)之间的区别是什么。您在?data.table 中是否找到以下内容?

That quote from the documentation appears to be from FAQ 1.12 What is the difference between X[Y] and merge(X,Y). Did you find the following in ?data.table and does it help?


roll 适用于最后一个连接列,通常是一个日期,但可以是任何
有序变量,不规则和包括空格。如果roll = TRUE,并且i的
行匹配除了最后一个x连接列之外的所有行,并且它在
中的值最后一个连接列落在一个间隙中(包括在最后一个
后的观察值x对于该组),则x中的当前值是
。这个操作使用修改的
二进制搜索特别快。该操作也称为最后观察携带
forward(LOCF)。通常,在x的键中不应有重复,
最后一个键列是一个日期(或时间或日期时间),并且x的键的所有列
被连接到。一个常见的习惯是在一组标识符
(ids)中选择
同时的常规时间序列(dts):DT [CJ(ids,dts),roll = TRUE]列键(id,日期)
和CJ代表交叉连接。

roll Applies to the last join column, generally a date but can be any ordered variable, irregular and including gaps. If roll=TRUE and i's row matches to all but the last x join column, and its value in the last i join column falls in a gap (including after the last observation in x for that group), then the prevailing value in x is rolled forward. This operation is particularly fast using a modified binary search. The operation is also known as last observation carried forward (LOCF). Usually, there should be no duplicates in x's key, the last key column is a date (or time, or datetime) and all the columns of x's key are joined to. A common idiom is to select a contemporaneous regular time series (dts) across a set of identifiers (ids): DT[CJ(ids,dts),roll=TRUE] where DT has a 2-column key (id,date) and CJ stands for cross join.


rolltolast 像卷,但是数据不会前滚过去的最后
观察在由连接列。值
必须落在x中的缺口中,但不在数据结尾之后,对于
,除了最后一个连接列之外的所有组定义的组。 roll和
rolltolast不能都为TRUE。

rolltolast Like roll but the data is not rolled forward past the last observation within each group defined by the join columns. The value of i must fall in a gap in x but not after the end of the data, for that group defined by all but the last join column. roll and rolltolast may not both be TRUE.

在左/右类比SQL连接方面,我更喜欢在FAQ 2.14的上下文中考虑一下。你能进一步解释为什么data.table是由A [B]语法
在基本中的灵感。这是一个很长的答案,所以我不会在这里粘贴。

In terms of left/right analogies to SQL joins, I prefer to think about that in the context of FAQ 2.14 Can you explain further why data.table is inspired by A[B] syntax in base. That's quite a long answer so I won't paste it here.

这篇关于滚动连接数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆