Hive LEFT SEMI JOIN 表示“不存在" [英] Hive LEFT SEMI JOIN for 'NOT EXISTS'

查看:79
本文介绍了Hive LEFT SEMI JOIN 表示“不存在"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表,只有一个键列.表 a 中的键是表 b 中所有键的子集.我需要从表 b 中选择不在表 a 中的键.

I have two tables with a single key column. Keys in table a are subset of all keys in table b. I need to select keys from table b that are NOT in table a.

这是 Hive 手册中的引文:LEFT SEMI JOIN 以有效的方式实现了不相关的 IN/EXISTS 子查询语义.从 Hive 0.13 开始,使用子查询支持 IN/NOT IN/EXISTS/NOT EXISTS 运算符,因此大多数这些 JOIN 不再需要手动执行. 使用LEFT SEMI JOIN 的限制是右侧表只能在连接条件(ON-子句)中引用,而不能在WHERE-或SELECT-子句等中引用."

Here is a citation from Hive manual: "LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way. As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using subqueries so most of these JOINs don't have to be performed manually anymore. The restrictions of using LEFT SEMI JOIN is that the right-hand-side table should only be referenced in the join condition (ON-clause), but not in WHERE- or SELECT-clauses etc."

他们使用这个例子来说明:

They use this example for illustration:

    SELECT a.key, a.value FROM a WHERE a.key IN (SELECT b.key FROM B);

相当于

    SELECT a.key, a.val FROM a LEFT SEMI JOIN b ON (a.key = b.key);

但是,我需要做的第一个示例是NOT IN;".不幸的是,Hive 0.13 不支持此语法.仅供参考:

However, what I need to do is first example with 'NOT IN;. Unfortunately this syntax is not supported in Hive 0.13. It's for illustration only:

    SELECT a.key, a.value FROM a WHERE a.key NOT IN (SELECT b.key FROM B);

我在这个网站上搜索了推荐,看到了这个例子:

I searched this site for recommendations, and saw this example:

    SELECT a.key FROM a LEFT OUTER JOIN b ON a.key = b.key WHERE b.key IS NULL;

它没有按预期工作.当我加入 a.key NOT in b 和 a.key IN b 时,我不会以这种方式获得原始 a.也许这是因为这个查询不能解决问题,注意粗体文本 - b.key 不应出现在 WHERE 中.

It does not work as expected. When I join a.key NOT in b and a.key IN b, I don't get the original a this way. Maybe that is because this query cannot do the trick, note bold text - b.key should not appear in WHERE.

那我该怎么办?还有什么技巧吗?谢谢!

What should I do then? Any other trick? Thanks!

附言我不能分享任何真实数据;这是一个非常简单的例子,其中 a 中的键都包含在 b 中,而 a 是 b 的子集.

P.S. I cannot share any real data; it's a pretty simple example, where keys in a are all included in b and a is a subset of b.

推荐答案

如果你想要表 b 的结果,也许你可以做以下的事情?

If you want results from table b, perhaps you can do the following instead?

  SELECT b.key FROM b LEFT OUTER JOIN a ON b.key = a.key WHERE a.key IS NULL;

这篇关于Hive LEFT SEMI JOIN 表示“不存在"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆