将外键上的SQL连接转换为R data.table语法 [英] Translating SQL joins on foreign keys to R data.table syntax

查看：90 发布时间：2017/3/12 10:14:55 sql r data.table

本文介绍了将外键上的SQL连接转换为R data.table语法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

data.table 包提供了许多与SQL相同的表处理方法。如果表有键，则该键由一个或多个列组成。但是一个表不能有多个键，因为它不能同时以两种不同的方式排序。

在这个例子中， X 和 Y 都是 data.table ID; Y 也有一个非键列x_id。

  X <  -  data.table（id = 1：5，a = 4：8，key =id）
 Y<  -  data.table（id = c ），x_id = c（1,4：1），key =id）

语法将加入其键上的表：

  X [Y] 
  pre> 
 
 如何将以下SQL语法转换为data.table代码？
  select * from X join Y on X.id = Y.x_id; 
  
我得到的最近的是：
  Y [X，list（id，x_id），by = x_id，nomatch = 0] 
  
但是，这不会像SQL语句一样执行内部连接。
 
 
 
 
 
 这里有一个更清楚的例子，其中外键是y_id，我们希望连接查找Y2的值，其中 X2 $ y_id = Y2 $ id 。 
  X2<  -  data.table（id = 1：5，y_id = c（1,1,2,2， 2），key =id）
 Y2 < -  data.table（id = 1：5，b = letters [1：5]，key =id）
  
我想生成表：
  id y_id b 
 1 1a
 2 1a
 3 2b
 4 2b
 5 2b 
  
类似于以下kludge所做的：
 > merge（data.frame（X2），data.frame（Y2），by.x =y_id，by.y =id）
 y_id id b 
 1 1 1 a 
 2 1 2 a 
 3 2 3 b 
 4 2 4 b 
 5 2 5 b 
  
但是，当我这样做：
  X2 [Y2，1：2，by = y_id ] 
  
我没有得到所需的结果：
  y_id V1 
 [1，] 1 1 
 [2，] 1 2 
 [3，] 2 1 
 [4 ，] 2 2 
  
 
 
解决方案
请注意？data.table 中的以下（诚然埋葬）：
 
  code> i 是一个 data.table ， x  。  i 使用键和 x  x  c>返回匹配。在 x 的每个键之间执行等连接。 i 匹配是在O（log n）时间中编译的C中的二进制搜索。如果 i 的列少于 x 的键，则 x 可以匹配 i 的每一行。如果 i 有比 x 的键多的列， i 不包含在连接中包括在结果中。 如果 i 也有一个键，则它是使用的 i 键列以匹配 x 的键列，并执行两个表的二进制合并。
 
 
所以，这里的关键是 i 不需要键入。只有 x 必须键入。
  X2<  -  data.table （id = 11:15，y_id = c（14,14,11,12,12），key =id）
 id y_id 
 [1，] 11 14 
 [2 ，] 12 14 
 [3，] 13 11 
 [4，] 14 12 
 [5，] 15 12 
 Y2<  -  data.table（id = 11： 15，b = letters [1：5]，key =id）
 id b 
 [1，] 11 a 
 [2，] 12 b 
 [ ] 13 c 
 [4，] 14 d 
 [5，] 15 e 
 Y2 [J（X2 $ y_id）]＃二进制搜索（未排序和未键入）i 
 id b 
 [1，] 14 d 
 [2，] 14 d 
 [3，] 11 a 
 [4，] 12 b 
 [ 5，] 12 b 
  
或
  Y2 [SJ（X2 $ y_id）]＃keyed i的二进制合并，见？SJ 
 id b 
 [1，] 11 a 
 [ 2，] b 
 [3，] 12 b 
 [4，] 14 d 
 [5，] 14 d 
 
相同$ y_id）]，Y2 [X2 $ y_id]）
 [1] FALSE 
  
 
The data.table package provides many of the same table handling methods as SQL. If a table has a key, that key consists of one or more columns. But a table can't have more than one key, because it can't be sorted in two different ways at the same time.


In this example, X and Y are data.tables with a single key column "id"; Y also has a non-key column "x_id".
   X <- data.table(id = 1:5, a=4:8,key="id")
   Y <- data.table(id = c(1,1, 3,5,7), x_id=c(1,4:1), key="id")
The following syntax would join the tables on their keys:
  X[Y]
How can I translate the following SQL syntax to data.table code?
  select * from X join Y on X.id = Y.x_id; 
The closest that I have gotten is:
Y[X,list(id, x_id),by = x_id,nomatch=0]
However, this does not do the same inner join as the SQL statement.



Here is a more clear example in which the foreign key is y_id, and we want the join to look up values of Y2 where X2$y_id = Y2$id. 
    X2 <- data.table(id = 1:5, y_id = c(1,1,2,2,2), key="id")
    Y2 <- data.table(id = 1:5, b = letters[1:5], key="id")
I would like to produce the table:
   id  y_id  b
    1     1 "a"
    2     1 "a"
    3     2 "b"
    4     2 "b"
    5     2 "b"
similar to what is done by the following kludge:
> merge(data.frame(X2), data.frame(Y2), by.x = "y_id", by.y = "id")
  y_id id b
1    1  1 a
2    1  2 a
3    2  3 b
4    2  4 b
5    2  5 b
However, when I do this:
    X2[Y2, 1:2,by = y_id]
I do not get the desired result:
    y_id V1
[1,]    1  1
[2,]    1  2
[3,]    2  1
[4,]    2  2

 解决方案 
Good question. Note the following (admittedly buried) in ?data.table :

  When i is a data.table, x must have a key. i is joined to x using the key and the rows in x that match are returned. An equi-join is performed between each column in i to each column in x's key. The match is a binary search in compiled C in O(log n) time. If i has less columns than x's key then many rows of x may match to each row of i. If i has more columns than x's key, the columns of i not involved in the join are included in the result. If i also has a key, it is i's key columns that are used to match to x's key columns and a binary merge of the two tables is carried out.
So, the key here is that i doesn't have to be keyed. Only x must be keyed.
X2 <- data.table(id = 11:15, y_id = c(14,14,11,12,12), key="id")
     id y_id
[1,] 11   14
[2,] 12   14
[3,] 13   11
[4,] 14   12
[5,] 15   12
Y2 <- data.table(id = 11:15, b = letters[1:5], key="id")
     id b
[1,] 11 a
[2,] 12 b
[3,] 13 c
[4,] 14 d
[5,] 15 e
Y2[J(X2$y_id)]  # binary search for each item of (unsorted and unkeyed) i
     id b
[1,] 14 d
[2,] 14 d
[3,] 11 a
[4,] 12 b
[5,] 12 b
or,
Y2[SJ(X2$y_id)]  # binary merge of keyed i, see ?SJ
     id b
[1,] 11 a
[2,] 12 b
[3,] 12 b
[4,] 14 d
[5,] 14 d

identical(Y2[J(X2$y_id)], Y2[X2$y_id])
[1] FALSE


                        
这篇关于将外键上的SQL连接转换为R data.table语法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

将外键上的SQL连接转换为R data.table语法 [英] Translating SQL joins on foreign keys to R data.table syntax

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将外键上的SQL连接转换为R data.table语法 [英] Translating SQL joins on foreign keys to R data.table syntax

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭