PostgreSQL为什么不能做这种简单的FULL JOIN? [英] Why can't PostgreSQL do this simple FULL JOIN?

查看:771
本文介绍了PostgreSQL为什么不能做这种简单的FULL JOIN?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个最小设置,其中包含2个表ab,每个表具有3行:

CREATE TABLE a (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON a (value);

CREATE TABLE b (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON b (value);

INSERT INTO a (value) VALUES ('x'), ('y'),        (NULL);
INSERT INTO b (value) VALUES        ('y'), ('z'), (NULL);

这里是一个LEFT JOIN,它可以按预期工作:

SELECT * FROM a
LEFT JOIN b ON a.value IS NOT DISTINCT FROM b.value;

输出:

 id | value | id | value 
----+-------+----+-------
  1 | x     |    | 
  2 | y     |  1 | y
  3 |       |  3 | 
(3 rows)

将"LEFT JOIN"更改为"FULL JOIN"会出现错误:

SELECT * FROM a
FULL JOIN b ON a.value IS NOT DISTINCT FROM b.value;

错误:仅在可合并合并或可哈希合并合并条件下支持FULL JOIN

有人可以回答吗?

什么是可合并合并或散列可合并的连接条件",为什么在a.value IS NOT DISTINCT FROM b.value上的连接不满足此条件,但是a.value = b.value完全可以?

似乎唯一的区别是NULL值的处理方式.由于在两个表中都为value列建立了索引,因此在NULL查找上运行EXPLAIN与查找非NULL的值一样有效:

EXPLAIN SELECT * FROM a WHERE value = 'x';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.67 rows=6 width=36)
   Recheck Cond: (value = 'x'::text)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value = 'x'::text)


EXPLAIN SELECT * FROM a WHERE value ISNULL;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.65 rows=6 width=36)
   Recheck Cond: (value IS NULL)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value IS NULL)

这已经在PostgreSQL 9.6.3和10beta1上进行了测试.

已经有关于此问题的讨论,但是它不能直接回答上述问题.

解决方案

PostgreSQL通过散列或合并联接实现FULL OUTER JOIN.

要有资格进行这样的联接,联接条件必须具有以下格式:

<expression using only left table> <operator> <expression using only right table>

现在您的加入条件 看起来像这样,但是PostgreSQL没有特殊的IS NOT DISTINCT FROM运算符,因此它将您的条件解析为:

(NOT ($1 IS DISTINCT FROM $2))

并且这样的表达式不能用于哈希或合并联接,因此会出现错误消息.

我可以考虑一种解决方法:

SELECT a_id, NULLIF(a_value, '<null>'),
       b_id, NULLIF(b_value, '<null>')
FROM (SELECT id AS a_id,
             COALESCE(value, '<null>') AS a_value
      FROM a
     ) x
   FULL JOIN
     (SELECT id AS b_id,
             COALESCE(value, '<null>') AS b_value
      FROM b
     ) y
      ON x.a_value = y.b_value;

如果<null>没有出现在value列中的任何位置,则可以使用.

Here's a minimal setup with 2 tables a and b each with 3 rows:

CREATE TABLE a (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON a (value);

CREATE TABLE b (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON b (value);

INSERT INTO a (value) VALUES ('x'), ('y'),        (NULL);
INSERT INTO b (value) VALUES        ('y'), ('z'), (NULL);

Here is a LEFT JOIN that works fine as expected:

SELECT * FROM a
LEFT JOIN b ON a.value IS NOT DISTINCT FROM b.value;

with output:

 id | value | id | value 
----+-------+----+-------
  1 | x     |    | 
  2 | y     |  1 | y
  3 |       |  3 | 
(3 rows)

Changing "LEFT JOIN" to "FULL JOIN" gives an error:

SELECT * FROM a
FULL JOIN b ON a.value IS NOT DISTINCT FROM b.value;

ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions

Can someone please answer:

What is a "merge-joinable or hash-joinable join condition" and why joining on a.value IS NOT DISTINCT FROM b.value doesn't fulfill this condition, but a.value = b.value is perfectly fine?

It seems that the only difference is how NULL values are handled. Since the value column is indexed in both tables, running an EXPLAIN on a NULL lookup is just as efficient as looking up values that are non-NULL:

EXPLAIN SELECT * FROM a WHERE value = 'x';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.67 rows=6 width=36)
   Recheck Cond: (value = 'x'::text)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value = 'x'::text)


EXPLAIN SELECT * FROM a WHERE value ISNULL;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.65 rows=6 width=36)
   Recheck Cond: (value IS NULL)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value IS NULL)

This has been tested with PostgreSQL 9.6.3 and 10beta1.

There has been discussion about this issue, but it doesn't directly answer the above question.

解决方案

PostgreSQL implements FULL OUTER JOIN with either a hash or a merge join.

To be eligible for such a join, the join condition has to have the form

<expression using only left table> <operator> <expression using only right table>

Now your join condition does look like this, but PostgreSQL does not have a special IS NOT DISTINCT FROM operator, so it parses your condition into:

(NOT ($1 IS DISTINCT FROM $2))

And such an expression cannot be used for hash or merge joins, hence the error message.

I can think of a way to work around it:

SELECT a_id, NULLIF(a_value, '<null>'),
       b_id, NULLIF(b_value, '<null>')
FROM (SELECT id AS a_id,
             COALESCE(value, '<null>') AS a_value
      FROM a
     ) x
   FULL JOIN
     (SELECT id AS b_id,
             COALESCE(value, '<null>') AS b_value
      FROM b
     ) y
      ON x.a_value = y.b_value;

That works if <null> does not appear anywhere in the value columns.

这篇关于PostgreSQL为什么不能做这种简单的FULL JOIN?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆