如何通过比较两个字段并考虑性能来联接表 [英] how to join tables by comparing two fields, and also considering performance
问题描述
这应该很简单,但我不明白.我需要进行选择以获取某些帐户的较新日期值.
This should be simple but I can't get it. I need to do a select to get newer date values for some accounts.
我从这里开始,T1:
+----------+---------+
| date | account |
+----------+---------+
| 4/1/2018 | 1 |
| 4/1/2018 | 2 |
| 4/1/2018 | 3 |
| 4/1/2018 | 4 |
| 4/1/2018 | 5 |
+----------+---------+
然后在T2中更新一些日期:
Then some dates are updated in T2:
+----------+---------+
| date | account |
+----------+---------+
| 7/1/2018 | 1 |
| 7/1/2018 | 2 |
+----------+---------+
我如何将这些输出输入到T3中,仅更新那些帐户?
How can I get this output into T3, updating just those accounts?
+----------+---------+
| date | account |
+----------+---------+
| 7/1/2018 | 1 |
| 7/1/2018 | 2 |
| 4/1/2018 | 3 |
| 4/1/2018 | 4 |
| 4/1/2018 | 5 |
+----------+---------+
我可以加入帐号,但是没有变化的帐号怎么办?如何捕获那些?
I can do a join on the account number, but what about the ones that didn't change? How to capture those?
此外,T1大约有800万条记录,因此性能将是一个因素.从Teradata中提取,并加载到Hive中.
Also, T1 has about 8 million records, so performance would be a factor. Extracting from Teradata, loading into Hive.
谢谢!
推荐答案
我认为您想要:
select t2.*
from t2
union all
select t1.*
from t1
where not exists (select 1 from t2 where t2.account = t1.account);
这首先从t2
中选择.然后,它从t1
获取剩余的帐户.
This selects first from t2
. Then it takes remaining accounts from t1
.
这篇关于如何通过比较两个字段并考虑性能来联接表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!