Hive LEFT SEMI JOIN for'NOT EXISTS' [英] Hive LEFT SEMI JOIN for 'NOT EXISTS'
问题描述
我有两个包含一个键列的表格。表a中的键是表b中所有键的子集。我需要从表b中选择不在表a中的键。
I have two tables with a single key column. Keys in table a are subset of all keys in table b. I need to select keys from table b that are NOT in table a.
以下是来自Hive手册的引文:
LEFT SEMI JOIN实现不相关的IN / EXISTS子查询语义从Hive 0.13开始,使用子查询支持IN / NOT IN / EXISTS / NOT EXISTS运算符,因此大多数JOIN不必手动执行。是仅在连接条件(ON子句)中引用右侧表,而不是在WHERE或SELECT子句中引用。
Here is a citation from Hive manual: "LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way. As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using subqueries so most of these JOINs don't have to be performed manually anymore. The restrictions of using LEFT SEMI JOIN is that the right-hand-side table should only be referenced in the join condition (ON-clause), but not in WHERE- or SELECT-clauses etc."
他们使用这个例子来说明:
They use this example for illustration:
SELECT a.key, a.value FROM a WHERE a.key IN (SELECT b.key FROM B);
相当于
Is equivalent to
SELECT a.key, a.val FROM a LEFT SEMI JOIN b ON (a.key = b.key);
然而,我需要做的是'NOT IN;'的第一个例子。不幸的是,这个语法在Hive 0.13中不被支持。仅用于说明:
However, what I need to do is first example with 'NOT IN;. Unfortunately this syntax is not supported in Hive 0.13. It's for illustration only:
SELECT a.key, a.value FROM a WHERE a.key NOT IN (SELECT b.key FROM B);
我搜索了这个网站的建议,看到这个例子:
I searched this site for recommendations, and saw this example:
SELECT a.key FROM a LEFT OUTER JOIN b ON a.key = b.key WHERE b.key IS NULL;
它无法按预期工作。当我在b和a.key中加入a.key而不是b时,我没有以这种方式得到原始的。也许这是因为这个查询无法做到这一点,注意粗体文本 - b.key不应该出现在WHERE中。
It does not work as expected. When I join a.key NOT in b and a.key IN b, I don't get the original a this way. Maybe that is because this query cannot do the trick, note bold text - b.key should not appear in WHERE.
然后我应该怎么做?任何其他技巧?谢谢!
What should I do then? Any other trick? Thanks!
我无法分享任何实际数据;这是一个非常简单的例子,其中a中的键全部包含在b中,而a是b中的一个子集。
P.S. I cannot share any real data; it's a pretty simple example, where keys in a are all included in b and a is a subset of b.
推荐答案
想要表b的结果,也许你可以做下面的代替吗?
If you want results from table b, perhaps you can do the following instead?
SELECT b.key FROM b LEFT OUTER JOIN a ON b.key = a.key WHERE a.key IS NULL;
这篇关于Hive LEFT SEMI JOIN for'NOT EXISTS'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!