Hive完全外部联接为同一联接键返回多行 [英] Hive Full Outer Join Returning multiple rows for same Join Key
问题描述
我正在对同一列的4个表进行完全外部联接. 我只想为连接"列中的每个不同值生成一行.
I am doing full outer join on 4 tables on the same column. I want to generate only 1 row for each different value in the Join column.
输入是:
employee1
+---------------------+-----------------+--+
| employee1.personid | employee1.name |
+---------------------+-----------------+--+
| 111 | aaa |
| 222 | bbb |
| 333 | ccc |
+---------------------+-----------------+--+
employee2
+---------------------+----------------+--+
| employee2.personid | employee2.sal |
+---------------------+----------------+--+
| 111 | 2 |
| 200 | 3 |
+---------------------+----------------+--+
employee3
+---------------------+------------------+--+
| employee3.personid | employee3.place |
+---------------------+------------------+--+
| 111 | bbsr |
| 300 | atl |
| 200 | ny |
+---------------------+------------------+--+
employee4
+---------------------+---------------+--+
| employee4.personid | employee4.dt |
+---------------------+---------------+--+
| 111 | 2019-02-21 |
| 300 | 2019-03-18 |
| 400 | 2019-03-18 |
+---------------------+---------------+--+
预期结果 每个人名一条记录,因此总共应该有6条记录(111,222,333,200,300,400) 喜欢:
Expected Result one record for each personid, so total there should be 6 records(111,222,333,200,300,400) Like:
+-----------+---------+--------+----------+-------------+--+
| personid | f.name | u.sal | v.place | v_in.dt |
+-----------+---------+--------+----------+-------------+--+
| 111 | aaa | 2 | bbsr | 2019-02-21 |
| 200 | NULL | 3 | ny | NULL |
| 222 | bbb | NULL | NULL | NULL |
| 300 | NULL | NULL | atl | 2019-03-18 |
| 333 | ccc | NULL | NULL | NULL |
| 400 | NULL | NULL | NULL | 2019-03-18 |
+-----------+---------+--------+----------+-------------+--+
我得到的结果是:
+-----------+---------+--------+----------+-------------+--+
| personid | f.name | u.sal | v.place | v_in.dt |
+-----------+---------+--------+----------+-------------+--+
| 111 | aaa | 2 | bbsr | 2019-02-21 |
| 200 | NULL | 3 | NULL | NULL |
| 200 | NULL | NULL | ny | NULL |
| 222 | bbb | NULL | NULL | NULL |
| 300 | NULL | NULL | atl | NULL |
| 300 | NULL | NULL | NULL | 2019-03-18 |
| 333 | ccc | NULL | NULL | NULL |
| 400 | NULL | NULL | NULL | 2019-03-18 |
+-----------+---------+--------+----------+-------------+--+
使用的查询:
select coalesce(f.personid, u.personid, v.personid, v_in.personid) as personid,f.name,u.sal,v.place,v_in.dt
from employee1 f FULL OUTER JOIN employee2 u on f.personid=u.personid
FULL OUTER JOIN employee3 v on f.personid=v.personid
FULL OUTER JOIN employee4 v_in on f.personid=v_in.personid;
请建议如何产生预期结果.
Please suggest how to generate the expected result.
推荐答案
完全联接返回所有联接的行+所有未从左表联接的行+所有未从右表联接的行.并且由于将employee2
,employee3
,employee4
连接到不包含personid=200
的同一employee1
表,因此所有四个表都返回了所有未连接的行.
FULL JOIN returns all rows joined + all not joined from the left table + all not joined from the right table. And since you are joining employee2
, employee3
, employee4
to the same employee1
table which does not contain personid=200
, all not joined rows returned from all four tables.
我建议对所有四个表都进行UNION,为缺少的字段提供NULL +通过personid
聚合组:
I'd suggest to UNION ALL all four tables providing NULLs for missing fields + aggregate group by personid
:
select personid, max(name) name, max(sal) sal, max(place) place, max(dt) dt
from
(
select personid, name, NULL sal, NULL place, NULL dt from employee1 e1
UNION ALL
select personid, NULL name, sal, NULL place, NULL dt from employee2 e2
UNION ALL
select personid, NULL name, NULL sal, place, NULL dt from employee3 e3
UNION ALL
select personid, NULL name, NULL sal, NULL place, dt from employee4 e4
)s
group by personid;
这将比联接更好.
这篇关于Hive完全外部联接为同一联接键返回多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!