Hive完全外部联接为同一联接键返回多行 [英] Hive Full Outer Join Returning multiple rows for same Join Key

查看:73
本文介绍了Hive完全外部联接为同一联接键返回多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对同一列的4个表进行完全外部联接. 我只想为连接"列中的每个不同值生成一行.

I am doing full outer join on 4 tables on the same column. I want to generate only 1 row for each different value in the Join column.

输入是:

employee1
+---------------------+-----------------+--+
| employee1.personid  | employee1.name  |
+---------------------+-----------------+--+
| 111                 | aaa             |
| 222                 | bbb             |   
| 333                 | ccc             | 
+---------------------+-----------------+--+
employee2
+---------------------+----------------+--+
| employee2.personid  | employee2.sal  |
+---------------------+----------------+--+
| 111                 | 2              |
| 200                 | 3              |
+---------------------+----------------+--+
employee3
+---------------------+------------------+--+
| employee3.personid  | employee3.place  |
+---------------------+------------------+--+
| 111                 | bbsr             |
| 300                 | atl              |
| 200                 | ny               |
+---------------------+------------------+--+
employee4
+---------------------+---------------+--+
| employee4.personid  | employee4.dt  |
+---------------------+---------------+--+
| 111                 | 2019-02-21    |
| 300                 | 2019-03-18    |
| 400                 | 2019-03-18    |
+---------------------+---------------+--+

预期结果 每个人名一条记录,因此总共应该有6条记录(111,222,333,200,300,400) 喜欢:

Expected Result one record for each personid, so total there should be 6 records(111,222,333,200,300,400) Like:

+-----------+---------+--------+----------+-------------+--+
| personid  | f.name  | u.sal  | v.place  |   v_in.dt   |
+-----------+---------+--------+----------+-------------+--+
| 111       | aaa     | 2      | bbsr     | 2019-02-21  |
| 200       | NULL    | 3      | ny       | NULL        |
| 222       | bbb     | NULL   | NULL     | NULL        |
| 300       | NULL    | NULL   | atl      | 2019-03-18  |
| 333       | ccc     | NULL   | NULL     | NULL        |
| 400       | NULL    | NULL   | NULL     | 2019-03-18  |
+-----------+---------+--------+----------+-------------+--+

我得到的结果是:

+-----------+---------+--------+----------+-------------+--+
| personid  | f.name  | u.sal  | v.place  |   v_in.dt   |
+-----------+---------+--------+----------+-------------+--+
| 111       | aaa     | 2      | bbsr     | 2019-02-21  |
| 200       | NULL    | 3      | NULL     | NULL        |
| 200       | NULL    | NULL   | ny       | NULL        |
| 222       | bbb     | NULL   | NULL     | NULL        |
| 300       | NULL    | NULL   | atl      | NULL        |
| 300       | NULL    | NULL   | NULL     | 2019-03-18  |
| 333       | ccc     | NULL   | NULL     | NULL        |
| 400       | NULL    | NULL   | NULL     | 2019-03-18  |
+-----------+---------+--------+----------+-------------+--+

使用的查询:

select coalesce(f.personid, u.personid, v.personid, v_in.personid) as personid,f.name,u.sal,v.place,v_in.dt
from employee1 f FULL OUTER JOIN employee2 u on f.personid=u.personid
FULL OUTER JOIN employee3 v on f.personid=v.personid
FULL OUTER JOIN employee4 v_in on f.personid=v_in.personid;

请建议如何产生预期结果.

Please suggest how to generate the expected result.

推荐答案

完全联接返回所有联接的行+所有未从左表联接的行+所有未从右表联接的行.并且由于将employee2employee3employee4连接到不包含personid=200的同一employee1表,因此所有四个表都返回了所有未连接的行.

FULL JOIN returns all rows joined + all not joined from the left table + all not joined from the right table. And since you are joining employee2, employee3, employee4 to the same employee1 table which does not contain personid=200, all not joined rows returned from all four tables.

我建议对所有四个表都进行UNION,为缺少的字段提供NULL +通过personid聚合组:

I'd suggest to UNION ALL all four tables providing NULLs for missing fields + aggregate group by personid:

select personid, max(name) name, max(sal) sal, max(place) place, max(dt) dt 
from 
(
select  personid, name, NULL sal, NULL place, NULL dt from employee1  e1
UNION ALL
select  personid, NULL name, sal, NULL place, NULL dt from employee2  e2
UNION ALL
select  personid, NULL name, NULL sal, place, NULL dt from employee3  e3
UNION ALL
select  personid, NULL name, NULL sal, NULL place, dt from employee4  e4
)s
group by personid;

这将比联接更好.

这篇关于Hive完全外部联接为同一联接键返回多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆