Hive 的隐式连接总是内部连接吗? [英] Are Hive's implicit joins always inner joins?
问题描述
Hive 的 join 文档鼓励使用隐式连接,即
The join documentation for Hive encourages the use of implicit joins, i.e.
SELECT *
FROM table1 t1, table2 t2, table3 t3
WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535';
这相当于
SELECT t1.*, t2.*, t3.*
FROM table1 t1
INNER JOIN table2 t2 ON
t1.id = t2.id
INNER JOIN table3 t3 ON
t2.id = t3.id
WHERE t1.zipcode = '02535'
,或者上面会返回额外的记录?
, or will the above return additional records?
推荐答案
并非总是如此.您的查询是等效的.但是如果没有 WHERE t1.id = t2.id AND t2.id = t3.id
,它将是 CROSS JOIN
.
Not always. Your queries are equivalent. But without WHERE t1.id = t2.id AND t2.id = t3.id
it will be CROSS JOIN
.
更新:
这是一个有趣的问题,我决定添加一些演示.让我们创建两个表:
This is interesting question and I decided to add some demo. Let's create two tables:
A(c1 int, c2 string)
和 B(c1 int, c2 string)
.
加载数据:
insert into table A
select 1, 'row one' union all
select 2, 'row two';
insert into table B
select 1, 'row one' union all
select 3, 'row three';
检查数据:
hive> select * from A;
OK
1 row one
2 row two
Time taken: 1.29 seconds, Fetched: 2 row(s)
hive> select * from B;
OK
1 row one
3 row three
Time taken: 0.091 seconds, Fetched: 2 row(s)
检查交叉连接(没有where
转换为交叉的隐式连接):
Check cross join (implicit join without where
transformed to cross):
hive> select a.c1, a.c2, b.c1, b.c2 from a,b;
Warning: Map Join MAPJOIN[14][bigTable=a] in task 'Stage-3:MAPRED' is a cross product
Warning: Map Join MAPJOIN[22][bigTable=b] in task 'Stage-4:MAPRED' is a cross product
Warning: Shuffle Join JOIN[4][tables = [a, b]] in Stage 'Stage-1:MAPRED' is a cross product
OK
1 row one 1 row one
2 row two 1 row one
1 row one 3 row three
2 row two 3 row three
Time taken: 54.804 seconds, Fetched: 4 row(s)
检查内连接(使用 where
作为 INNER 的隐式连接):
Check inner join (implicit join with where
works as INNER):
hive> select a.c1, a.c2, b.c1, b.c2 from a,b where a.c1=b.c1;
OK
1 row one 1 row one
Time taken: 38.413 seconds, Fetched: 1 row(s)
尝试通过将 OR b.c1 is null
添加到 where 来执行左连接:
Try to perform left join by adding OR b.c1 is null
to the where:
hive> select a.c1, a.c2, b.c1, b.c2 from a,b where (a.c1=b.c1) OR (b.c1 is null);
OK
1 row one 1 row one
Time taken: 57.317 seconds, Fetched: 1 row(s)
如您所见,我们再次获得了内连接.或b.c1为空
被忽略
As you can see we got inner join again. or b.c1 is null
is ignored
现在 left join
没有 where
和 ON
子句(转换为 CROSS):
Now left join
without where
and ON
clause (transformed to CROSS):
select a.c1, a.c2, b.c1, b.c2 from a left join b;
OK
1 row one 1 row one
1 row one 3 row three
2 row two 1 row one
2 row two 3 row three
Time taken: 37.104 seconds, Fetched: 4 row(s)
正如你所看到的,我们再次交锋.
尝试使用 where
子句和不使用 ON
的左连接(用作 INNER):
Try left join with where
clause and without ON
(works as INNER):
select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1;
OK
1 row one 1 row one
Time taken: 40.617 seconds, Fetched: 1 row(s)
我们得到了 INNER 连接
尝试使用 where
子句进行左连接,而不使用 ON
+ 尝试允许空值:
Try left join with where
clause and without ON
+ try to allow nulls:
select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1 or b.c1 is null;
OK
1 row one 1 row one
Time taken: 53.873 seconds, Fetched: 1 row(s)
再次获得内在.或 b.c1 is null
被忽略.
Again got INNER. or b.c1 is null
is ignored.
左连接与 on
子句:
Left join with on
clause:
hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1;
OK
1 row one 1 row one
2 row two NULL NULL
Time taken: 48.626 seconds, Fetched: 2 row(s)
是的,这是真正的左连接.
Left join with on
+ where
(得到 INNER):
Left join with on
+ where
(got INNER):
hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1;
OK
1 row one 1 row one
Time taken: 49.54 seconds, Fetched: 1 row(s)
我们得到 INNER 是因为 WHERE 不允许 NULL.
左连接 where + 允许空值:
hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1 or b.c1 is null;
OK
1 row one 1 row one
2 row two NULL NULL
Time taken: 55.951 seconds, Fetched: 2 row(s)
是的,是左连接.
结论:
- 隐式连接作为 INNNER (with where) 或 CROSS 如果没有 WHERE条款.
- 如果没有 ON 和 WHERE,左连接可以作为 CROSS 工作,如果 WHERE 子句不允许空值,也可以作为 INNER 工作为了权利表.
- 最好使用 ANSI 语法,因为它是不言自明的,并且很容易理解您期望它的工作方式.作为 INNER 或 CROSS 工作的隐式连接或左连接很难理解并且很容易出错.
这篇关于Hive 的隐式连接总是内部连接吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!