如何使用 HIVE 比较两个表并返回不同的行 [英] How to compare two tables and return rows with difference with HIVE
问题描述
假设我有一个包含大约 180 列和 100 条记录的表.该表被备份到临时表中,原始表被删除.在生成相同表的管道上运行此迁移(更改)之后.我想将备份的表与新的行(记录)进行比较,并将任何差异移至第三个表(_result 表),所以我这样做:
So lets say I have a table with about 180 columns and 100 records. This table is backed up into temporary table and original one is removed. After this migration (change) is run on a pipeline which produces the same table. I want to compare the backed up table to the new one adn rows (records) with any difference to be moved to 3rd table (_result table) so I do:
INSERT OVERWRITE TABLE
zakj_customers.customers_detail_result
SELECT
acct_id, IF (a.title != b.title, 1, 0) title, IF (a.fname != b.fname, 1, 0) fname, IF (a.dob != b.dob, 1, 0) dob, IF (a.cr_date != b.cr_date, 1, 0) cr_date
FROM
zakj_customers.customers_detail a
LEFT OUTER JOIN
zakj_customers.customers_detail_backup b
ON
(a.acct_id = b.acct_id)
ORDER BY
title DESC,fname DESC,dob DESC,cr_date DESC
HAVING
title > 0 AND fname > 0 AND dob > 0 AND cr_date > 0
;
所以忘记了这个查询是错误的,我对 SQL 不太了解,而且我遇到了语法错误,所以我不能把它正确地放在一起,并且在一张票上它是以这种格式提供的,这显然是错误的.
So oblivious this query is wrong, I'm not much into SQL, and I'm getting syntax errors, so I can't put it together right and on a ticket it was supplied in this format which is obviously wrong.
任何人都可以看到这样做的方式吗?
Can anyone see the way this could be done?
干杯
推荐答案
必须使用case when"而不是 if:
Must use "case when" instead of if:
Case When a.title <> b.title then 1 Else 0 End title
我不会在 where 条件中写有但表达式:
I wouldn't write having but the expression into the where condition:
INSERT Into
zakj_customers.customers_detail_result
SELECT
acct_id, a.title, a.fname, dob, a.cr_date
FROM
zakj_customers.customers_detail a
LEFT OUTER JOIN
zakj_customers.customers_detail_backup b
ON
(a.acct_id = b.acct_id)
Where b.acct_id is null or a.title <> b.title or a.fname <> b.fname or a.cr_date <> b.cr_date;
"b.acct_id is null" 是获取新记录所必需的,因为 <> 会过滤掉它们.
"b.acct_id is null" is required to get the new records beacuse <> would filter out them.
(插入记录时完全不需要排序.)
(Ordering is totally unnecessary when inserting records.)
这篇关于如何使用 HIVE 比较两个表并返回不同的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!