如何通过比较两个字段并考虑性能来联接表 [英] how to join tables by comparing two fields, and also considering performance

查看:124
本文介绍了如何通过比较两个字段并考虑性能来联接表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这应该很简单,但我不明白.我需要进行选择以获取某些帐户的较新日期值.

This should be simple but I can't get it. I need to do a select to get newer date values for some accounts.

我从这里开始,T1:

+----------+---------+
|  date   | account |
+----------+---------+
| 4/1/2018 |       1 |
| 4/1/2018 |       2 |
| 4/1/2018 |       3 |
| 4/1/2018 |       4 |
| 4/1/2018 |       5 |
+----------+---------+

然后在T2中更新一些日期:

Then some dates are updated in T2:

+----------+---------+
|   date   | account |
+----------+---------+
| 7/1/2018 |       1 |
| 7/1/2018 |       2 |
+----------+---------+

我如何将这些输出输入到T3中,仅更新那些帐户?

How can I get this output into T3, updating just those accounts?

+----------+---------+
|   date   | account |
+----------+---------+
| 7/1/2018 |       1 |
| 7/1/2018 |       2 |
| 4/1/2018 |       3 |
| 4/1/2018 |       4 |
| 4/1/2018 |       5 |
+----------+---------+

我可以加入帐号,但是没有变化的帐号怎么办?如何捕获那些?

I can do a join on the account number, but what about the ones that didn't change? How to capture those?

此外,T1大约有800万条记录,因此性能将是一个因素.从Teradata中提取,并加载到Hive中.

Also, T1 has about 8 million records, so performance would be a factor. Extracting from Teradata, loading into Hive.

谢谢!

推荐答案

我认为您想要:

select t2.*
from t2
union all
select t1.*
from t1
where not exists (select 1 from t2 where t2.account = t1.account);

这首先从t2中选择.然后,它从t1获取剩余的帐户.

This selects first from t2. Then it takes remaining accounts from t1.

这篇关于如何通过比较两个字段并考虑性能来联接表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆