将外键上的SQL连接转换为R data.table语法 [英] Translating SQL joins on foreign keys to R data.table syntax
问题描述
data.table
包提供了许多与SQL相同的表处理方法。如果表有键,则该键由一个或多个列组成。但是一个表不能有多个键,因为它不能同时以两种不同的方式排序。
在这个例子中, X
和 Y
都是 data.table
ID; Y
也有一个非键列x_id。
X < - data.table(id = 1:5,a = 4:8,key =id)
Y< - data.table(id = c ),x_id = c(1,4:1),key =id)
语法将加入其键上的表:
X [Y]
pre>
如何将以下SQL语法转换为data.table代码?
select * from X join Y on X.id = Y.x_id;
我得到的最近的是:
Y [X,list(id,x_id),by = x_id,nomatch = 0]
但是,这不会像SQL语句一样执行内部连接。
这里有一个更清楚的例子,其中外键是y_id,我们希望连接查找Y2的值,其中
X2 $ y_id = Y2 $ id
。X2< - data.table(id = 1:5,y_id = c(1,1,2,2, 2),key =id)
Y2 < - data.table(id = 1:5,b = letters [1:5],key =id)
我想生成表:
id y_id b
1 1a
2 1a
3 2b
4 2b
5 2b
类似于以下kludge所做的:
> merge(data.frame(X2),data.frame(Y2),by.x =y_id,by.y =id)
y_id id b
1 1 1 a
2 1 2 a
3 2 3 b
4 2 4 b
5 2 5 b
但是,当我这样做:
X2 [Y2,1:2,by = y_id ]
我没有得到所需的结果:
y_id V1
[1,] 1 1
[2,] 1 2
[3,] 2 1
[4 ,] 2 2
解决方案请注意
?data.table
中的以下(诚然埋葬):
code> i 是一个
data.table
,x
。i 使用键和
x $ c $中的行连接到
x
c>返回匹配。在x
的每个键之间执行等连接。i
匹配是在O(log n)时间中编译的C中的二进制搜索。如果i
的列少于x
的键,则x
可以匹配i
的每一行。如果i
有比x
的键多的列,i
不包含在连接中包括在结果中。 如果i
也有一个键,则它是使用的i
键列以匹配x
的键列,并执行两个表的二进制合并。
所以,这里的关键是
i
不需要键入。只有x
必须键入。X2< - data.table (id = 11:15,y_id = c(14,14,11,12,12),key =id)
id y_id
[1,] 11 14
[2 ,] 12 14
[3,] 13 11
[4,] 14 12
[5,] 15 12
Y2< - data.table(id = 11: 15,b = letters [1:5],key =id)
id b
[1,] 11 a
[2,] 12 b
[ ] 13 c
[4,] 14 d
[5,] 15 e
Y2 [J(X2 $ y_id)]#二进制搜索(未排序和未键入)i
id b
[1,] 14 d
[2,] 14 d
[3,] 11 a
[4,] 12 b
[ 5,] 12 b
或
Y2 [SJ(X2 $ y_id)]#keyed i的二进制合并,见?SJ
id b
[1,] 11 a
[ 2,] b
[3,] 12 b
[4,] 14 d
[5,] 14 d
相同$ y_id)],Y2 [X2 $ y_id])
[1] FALSE
The
data.table
package provides many of the same table handling methods as SQL. If a table has a key, that key consists of one or more columns. But a table can't have more than one key, because it can't be sorted in two different ways at the same time.In this example,
X
andY
aredata.table
s with a single key column "id";Y
also has a non-key column "x_id".X <- data.table(id = 1:5, a=4:8,key="id") Y <- data.table(id = c(1,1, 3,5,7), x_id=c(1,4:1), key="id")
The following syntax would join the tables on their keys:
X[Y]
How can I translate the following SQL syntax to data.table code?
select * from X join Y on X.id = Y.x_id;
The closest that I have gotten is:
Y[X,list(id, x_id),by = x_id,nomatch=0]
However, this does not do the same inner join as the SQL statement.
Here is a more clear example in which the foreign key is y_id, and we want the join to look up values of Y2 where
X2$y_id = Y2$id
.X2 <- data.table(id = 1:5, y_id = c(1,1,2,2,2), key="id") Y2 <- data.table(id = 1:5, b = letters[1:5], key="id")
I would like to produce the table:
id y_id b 1 1 "a" 2 1 "a" 3 2 "b" 4 2 "b" 5 2 "b"
similar to what is done by the following kludge:
> merge(data.frame(X2), data.frame(Y2), by.x = "y_id", by.y = "id") y_id id b 1 1 1 a 2 1 2 a 3 2 3 b 4 2 4 b 5 2 5 b
However, when I do this:
X2[Y2, 1:2,by = y_id]
I do not get the desired result:
y_id V1 [1,] 1 1 [2,] 1 2 [3,] 2 1 [4,] 2 2
解决方案Good question. Note the following (admittedly buried) in
?data.table
:When
i
is adata.table
,x
must have a key.i
is joined tox
using the key and the rows inx
that match are returned. An equi-join is performed between each column ini
to each column inx
's key. The match is a binary search in compiled C in O(log n) time. Ifi
has less columns thanx
's key then many rows ofx
may match to each row ofi
. Ifi
has more columns thanx
's key, the columns ofi
not involved in the join are included in the result. Ifi
also has a key, it isi
's key columns that are used to match tox
's key columns and a binary merge of the two tables is carried out.So, the key here is that
i
doesn't have to be keyed. Onlyx
must be keyed.X2 <- data.table(id = 11:15, y_id = c(14,14,11,12,12), key="id") id y_id [1,] 11 14 [2,] 12 14 [3,] 13 11 [4,] 14 12 [5,] 15 12 Y2 <- data.table(id = 11:15, b = letters[1:5], key="id") id b [1,] 11 a [2,] 12 b [3,] 13 c [4,] 14 d [5,] 15 e Y2[J(X2$y_id)] # binary search for each item of (unsorted and unkeyed) i id b [1,] 14 d [2,] 14 d [3,] 11 a [4,] 12 b [5,] 12 b
or,
Y2[SJ(X2$y_id)] # binary merge of keyed i, see ?SJ id b [1,] 11 a [2,] 12 b [3,] 12 b [4,] 14 d [5,] 14 d identical(Y2[J(X2$y_id)], Y2[X2$y_id]) [1] FALSE
这篇关于将外键上的SQL连接转换为R data.table语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!