Hive的隐式联接是否总是内部联接? [英] Are Hive's implicit joins always inner joins?

查看:419
本文介绍了Hive的隐式联接是否总是内部联接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

针对Hive的联接文档 ,即

SELECT * 
FROM table1 t1, table2 t2, table3 t3 
WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535';

这等同于

SELECT t1.*, t2.*, t3.*
FROM table1 t1
INNER JOIN table2 t2 ON
  t1.id = t2.id 
INNER JOIN table3 t3 ON
  t2.id = t3.id
WHERE t1.zipcode = '02535'

,还是以上返回其他记录?

, or will the above return additional records?

推荐答案

并非总是如此.您的查询是等效的.但是如果没有WHERE t1.id = t2.id AND t2.id = t3.id,它将是CROSS JOIN.

Not always. Your queries are equivalent. But without WHERE t1.id = t2.id AND t2.id = t3.id it will be CROSS JOIN.

更新:

这是一个有趣的问题,我决定添加一些演示.让我们创建两个表:

This is interesting question and I decided to add some demo. Let's create two tables:

A(c1 int, c2 string)B(c1 int, c2 string).

加载数据:

insert into table A
 select 1, 'row one' union all
 select 2, 'row two';

insert into table B
 select 1, 'row one' union all
 select 3, 'row three';

检查数据:

hive> select * from A;
OK
1       row one
2       row two
Time taken: 1.29 seconds, Fetched: 2 row(s)
hive> select * from B;
OK
1       row one
3       row three
Time taken: 0.091 seconds, Fetched: 2 row(s)

检查交叉连接(不将where转换为交叉的隐式连接):

Check cross join (implicit join without where transformed to cross):

hive> select a.c1, a.c2, b.c1, b.c2 from a,b;
Warning: Map Join MAPJOIN[14][bigTable=a] in task 'Stage-3:MAPRED' is a cross product
Warning: Map Join MAPJOIN[22][bigTable=b] in task 'Stage-4:MAPRED' is a cross product
Warning: Shuffle Join JOIN[4][tables = [a, b]] in Stage 'Stage-1:MAPRED' is a cross product

OK
1       row one 1       row one
2       row two 1       row one
1       row one 3       row three
2       row two 3       row three
Time taken: 54.804 seconds, Fetched: 4 row(s)

检查内部联接(与where的隐式联接作为INNER一起工作):

Check inner join (implicit join with where works as INNER):

hive> select a.c1, a.c2, b.c1, b.c2 from a,b where a.c1=b.c1;
OK
1       row one 1       row one
Time taken: 38.413 seconds, Fetched: 1 row(s)

尝试通过向以下位置添加OR b.c1 is null来执行左联接:

Try to perform left join by adding OR b.c1 is null to the where:

hive> select a.c1, a.c2, b.c1, b.c2 from a,b where (a.c1=b.c1) OR (b.c1 is null);
OK
1       row one 1       row one
Time taken: 57.317 seconds, Fetched: 1 row(s)

如您所见,我们再次获得了内部联接. or b.c1 is null被忽略

As you can see we got inner join again. or b.c1 is null is ignored

现在的left join没有whereON子句(已转换为CROSS):

Now left join without where and ON clause (transformed to CROSS):

 select a.c1, a.c2, b.c1, b.c2 from a left join b;
OK
1       row one 1       row one
1       row one 3       row three
2       row two 1       row one
2       row two 3       row three
Time taken: 37.104 seconds, Fetched: 4 row(s)

如您所见,我们又越过了.

尝试使用where子句但不使用ON(作为INNER)进行左联接:

Try left join with where clause and without ON (works as INNER):

select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1;
OK
1       row one 1       row one
Time taken: 40.617 seconds, Fetched: 1 row(s)

我们获得了内部加入

尝试使用where子句但不使用ON +进行左连接,尝试允许空值:

Try left join with where clause and without ON+ try to allow nulls:

 select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1 or b.c1 is null;
OK
1       row one 1       row one
Time taken: 53.873 seconds, Fetched: 1 row(s)

再次获得INNER.或b.c1 is null被忽略.

Again got INNER. or b.c1 is null is ignored.

带有on子句的左联接:

Left join with on clause:

hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1;
OK
1       row one 1       row one
2       row two NULL    NULL
Time taken: 48.626 seconds, Fetched: 2 row(s)

是的,这确实是左联接.

通过on + where左联接(获得INNER):

Left join with on + where (got INNER):

hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1;
OK
1       row one 1       row one
Time taken: 49.54 seconds, Fetched: 1 row(s)

我们获得INNER,因为WHERE不允许使用NULL.

左连接,其中+允许为空:

hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1 or b.c1 is null;
OK
1       row one 1       row one
2       row two NULL    NULL
Time taken: 55.951 seconds, Fetched: 2 row(s)

是的,它是左联接.

结论:

  1. 隐式连接作为INNNER(带有where)或CROSS(如果不带WHERE的话) 条款.
  2. 如果没有ON和WHERE,则左联接可以用作CROSS;如果WHERE子句不允许为null,则左联接也可以用作INNER 为了正确 表.
  3. 更好地使用ANSI语法,因为它是不言自明的,并且很容易理解您希望它的工作方式.隐式连接或左连接作为INNER或CROSS很难理解,并且很容易出错.
  1. Implicit join works as INNNER (with where) or CROSS if without WHERE clause.
  2. Left join can work as CROSS if without ON and without WHERE, can also work as INNER if WHERE clause does not allows nulls for right table.
  3. Better use ANSI syntax because it is self-explaining and it is easy to understand what do you expect it to work like. Implicit joins or left joins working as INNER or CROSS are difficult to understand and very prone to error.

这篇关于Hive的隐式联接是否总是内部联接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆