左外部连接结果在蜂巢上变得更大 [英] left outer join results become bigger on hive
问题描述
全部,我想从此查询中获得可靠的结果:
All, I want to get reliable results from this query:
SELECT ..
FROM
(
SELECT
CO_CODE,
REP.cua cua,
PRD.PRODUCT_DESC,
REGEXP_EXTRACT(B.rfbbn,'^(?:[^*]*\\*){2}([^*]*)',1) cllt,
NVL(CCY_bbce,0) bbce,
B.TYPE,
A.conn_keyy
FROM
(
SELECT conn_keyy , ext_date FROM
(tablee.aa) A
)aaxyz
WHERE flag = 'Y'
)A
LEFT OUTER JOIN
tablee.B
ON A.conn_keyy = B.conn_keyy
LEFT OUTER JOIN (SELECT DISTINCT * FROM tablee.cc) CPLCUR
ON CPLCUR.conn_keyy = A.conn_keyy
AND CPLCUR.cllt = REGEXP_EXTRACT(B.rfbbn,'^(?:[^*]*\\*){2}([^*]*)',1)
AND CPLCUR.dtdt = '1999'
LEFT OUTER JOIN (SELECT DISTINCT * FROM tablee.dd) CPLBAL
ON CPLBAL.conn_keyy = A.conn_keyy
AND CPLBAL.SEQUENCE = CPLCUR.SEQUENCE
AND CPLBAL.dtdt = '1999'
LEFT OUTER JOIN (SELECT DISTINCT * FROM tablee.ee) CPLCCY
ON CPLCCY.conn_keyy = A.conn_keyy
AND CPLCCY.SEQUENCE = CPLCUR.SEQUENCE
AND CPLCCY.dtdt = '1999'
LEFT OUTER JOIN (SELECT DISTINCT * FROM tablee.ff) CPLMOV
ON CPLMOV.conn_keyy = A.conn_keyy
AND CPLMOV.SEQUENCE = CPLCUR.SEQUENCE
AND CPLMOV.dtdt = '1999'
LEFT OUTER JOIN
(tablee.REP)REP
ON REP.relino = B.lnido
LEFT OUTER JOIN tablee.P PRD
ON PRD.PRODUCT_CODE = REGEXP_EXTRACT(A.conn_keyy,'[.]([^.]+)',1)
AND PRD.dtdt = '1999'
WHERE B.lnido LIKE 'PLCONS1%'
) rrvv;
fyi,A的选择计数(*)约为60,000
fyi, the select count (*) of A is around 60,000
我只是想知道为什么我的查询结果变成15亿..我错过了什么?我操作左外联接时出了什么问题?
I just wondering why my query results turns 1.5 billions.. What did I miss? What went wrong when i operate the left-outer-join?
推荐答案
如果第二个表中的联接键不是唯一的,并且两个表中的联接键都不是唯一的,则联接可以重复行将产生更多的重复项.
Join can duplicate rows if the join key is not unique in second table and if join key is not unique in both tables, it will produce much more duplicates.
例如:
with
A as (
select 1 key, 'one' name
union all
select 1 key, 'two' name
),
B as (
select 1 key, 'one' name
union all
select 1 key, 'two' name
)
select *
from A left join B on A.key=B.key
结果为四行,每个表仅包含两行:
Result in four rows and each table contain only two:
a.key a.name b.key b.name
1 one 1 one
1 one 1 two
1 two 1 one
1 two 1 two
如何查找重复的密钥:
select B.conn_keyy, count(*) cnt
from tablee.B
group by B.conn_keyy
having count(*)>1
order by cnt desc limit 100;
检查要与之连接的每个表并确定可以执行的操作:应用过滤,不同或添加更多的联接键以进行联接(一对一或零)或(许多一或零)
Check every table you are joining with and decide what you can do: apply filtering, distinct or add more join keys to make join (one to one or zero) or (many to one or zero)
这篇关于左外部连接结果在蜂巢上变得更大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!