LEFT OUTER JOIN与多个值阵列列 [英] LEFT OUTER JOIN on array column with multiple values

查看:259
本文介绍了LEFT OUTER JOIN与多个值阵列列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎无法找到的伎俩,通过一个数组列连接两个表时,一个表是不是一个数组值,其他表的数组值可以包含多个值。当有一个单值阵列它的工作。

I cannot seem to find the trick to join two tables through an array column when one table is not an array value, and the other table's array value can contain multiple values. It does work when there is a single valued array.

下面就是我在谈论的一个简单的小例子。真正的表格对数组列FWIW GIN索引。这些不这样做,但查询的行为是一样的。

Here's a simple minimal example of what I'm talking about. The real tables have GIN indexes on the array columns FWIW. These do not, but the query behaves the same.

DROP TABLE IF EXISTS eg_person;
CREATE TABLE eg_person (id INT PRIMARY KEY, name TEXT);
INSERT INTO eg_person (id, name) VALUES
  (1, 'alice')
, (2, 'bob')
, (3, 'charlie');

DROP TABLE IF EXISTS eg_assoc;
CREATE TABLE eg_assoc (aid INT PRIMARY KEY, actors INT[], benefactors INT[]);
INSERT INTO eg_assoc (aid, actors, benefactors) VALUES
  (1, '{1}'  , '{2}')
, (2, '{1,2}', '{3}')
, (3, '{1}'  , '{2,3}')
, (4, '{4}'  , '{1}');

SELECT aid, actors, a_person.name, benefactors, b_person.name 
FROM   eg_assoc
LEFT   JOIN eg_person a_person on array[a_person.id] @> eg_assoc.actors
LEFT   JOIN eg_person b_person on array[b_person.id] @> eg_assoc.benefactors;

实际结果是这个像这样。这里的问题是,名字列出现 NULL 如果演员恩人包含多个值。

 aid | actors | name  | benefactors |  name   
-----+--------+-------+-------------+---------
   1 | {1}    | alice | {2}         | bob
   2 | {1,2}  |       | {3}         | charlie
   3 | {1}    | alice | {2,3}       | 
   4 | {4}    |       | {1}         | alice

我期待这样的:

 aid | actors | name  | benefactors |  name   
-----+--------+-------+-------------+---------
   1 | {1}    | alice | {2}         | bob
   2 | {1,2}  | alice | {3}         | charlie
   2 | {1,2}  | bop   | {3}         | charlie
   3 | {1}    | alice | {2,3}       | bob
   3 | {1}    | alice | {2,3}       | charlie
   4 | {4}    |       | {1}         | alice

这将是非常好的,如果我能得到它看起来像这一点,虽然:

It would be really nice if I could get it to look like this, though:

 aid | actors | name        | benefactors |  name   
-----+--------+-------------+-------------+---------
   1 | {1}    | {alice}     | {2}         | {bob}
   2 | {1,2}  | {alice,bob} | {3}         | {charlie}
   3 | {1}    | {alice}     | {2,3}       | {bob, charlie}
   4 | {4}    |             | {1}         | {alice}

我知道,这个模式不规范,我愿意去一个正常的再presentation如果需要的话。然而,这是一个汇总查询,它已经涉及到很多更加入比我想。

I'm aware that this schema denormalized, and I'm willing to go to a normal representation if need be. However, this is for a summary query and it already involves a lot more joins than I'd like.

推荐答案

您是对的,在<一个href=\"http://stackoverflow.com/questions/22255503/why-isnt-my-postgresql-array-index-getting-used-rails-4/22265078#22265078\">overlap运营商&放大器;&安培; 可以在阵列的使用GIN索引。像下面的查询非常有用(找演员中有1人行):

&& operator and index?

You are right, the overlap operator && could use a GIN index on arrays. Very useful for queries like the following (to find rows with person 1 among the actors):

SELECT * FROM eg_assoc WHERE actors && '{1}'::int[]

然而,您的查询的逻辑是倒过来的,寻找 eg_assoc 在阵列中列出的所有的人。一个GIN索引的没有的帮助在这里。我们只需要在PK person.id

However, the logic of your query is the other way round, looking for all persons listed in the arrays in eg_assoc. A GIN index is no help here. We just need the btree index of the PK person.id.

这不是一个简单的问题。通过阅读这篇启动:

This is not a trivial problem. Start by reading this:

  • PostgreSQL unnest() with element number

下面的查询preserve原数组完全按照给定的,包括可能的重复的元素和元素的原始顺序。适用于 1 dimenstional阵列。附加维度被折叠成一个单一的尺寸。它更复杂preserve多个维度(但完全可能):

The following queries preserve original arrays exactly as given, including possible duplicate elements and original order of elements. Works for 1-dimenstional arrays. Additional dimensions are folded into a single dimension. It's more complex to preserve multiple dimensions (but totally possible):

有关Postgres的 8.4 + (其中 generate_subsrcipts() 介绍):

For Postgres 8.4+ (where generate_subsrcipts() was introduced):

SELECT aid, actors
     , ARRAY( SELECT name
              FROM   generate_subscripts(e.actors, 1) i
              JOIN   eg_person p ON p.id = e.actors[i]
              ORDER  BY i) AS act_names
     , benefactors
     , ARRAY( SELECT name
              FROM   generate_subscripts(e.benefactors, 1) i
              JOIN   eg_person p ON p.id = e.benefactors[i]
              ORDER  BY i) AS ben_names
FROM   eg_assoc e;

可能还是表现最好,甚至在第9.3。结果
使用<一个href=\"http://www.postgresql.org/docs/current/interactive/sql-ex$p$pssions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS\"相对=nofollow>数组构造比 ARRAY_AGG快()

对于PostgreSQL的 9.3 +

For PostgreSQL 9.3+.

SELECT e.aid, e.actors, a.act_names, e.benefactors, b.ben_names
FROM   eg_assoc e
, LATERAL (
   SELECT ARRAY( SELECT name
                 FROM   generate_subscripts(e.actors, 1) i
                 JOIN   eg_person p ON p.id = e.actors[i]
                 ORDER  BY i)
   ) a(act_names)
, LATERAL (
   SELECT ARRAY( SELECT name
                 FROM   generate_subscripts(e.benefactors, 1) i
                 JOIN   eg_person p ON p.id = e.benefactors[i]
                 ORDER  BY i)
   ) b(ben_names);

SQL小提琴 与一对夫妇的变种(除该版本为PG 9.4)。

SQL Fiddle with a couple of variants (except the version for pg 9.4).

微妙的细部:如果没有找到一个人,它只是离开了。这两个查询产生的空数组的( '{} ),如果没有人发现整个数组。其他查询方式将返回的 NULL 。我添加变体小提琴。

Subtle detail: If a person is not found, it's just left out. Both of these queries generate an empty array ('{}') if no person is found for the whole array. Other query styles would return NULL. I added variants to the fiddle.

SELECT aid, actors
     , ARRAY(SELECT name
             FROM   unnest(e.actors) WITH ORDINALITY a(id, i)
             JOIN   eg_person p USING (id)
             ORDER  BY a.i) AS act_names
     , benefactors
     , ARRAY(SELECT name
             FROM   unnest(e.benefactors) WITH ORDINALITY b(id, i)
             JOIN   eg_person USING (id)
             ORDER  BY b.i) AS ben_names
FROM   eg_assoc e;

无法查询

由@a_horse 的提供的查询似乎的做工作,但它是不可靠的,误导,可能不正确的和不必要的昂贵。

Failed query

The query provided by @a_horse seems to do the job, but it is unreliable, misleading, potentially incorrect and needlessly expensive.


  1. 代理交叉连接,因为两个不相关的连接。手法高明的反模式。联系方式:

  1. Proxy cross join because of two unrelated joins. A sneaky anti-pattern. Details:

  • Two SQL LEFT JOINS produce incorrect result

与表面固定 DISTINCT ARRAY_AGG()来消除产生的重复,而是真正把对口红猪。它还消除重复原始,因为它不可能告诉在这一点上的差异 - 这是不正确的潜在

Fixed superficially with DISTINCT in array_agg()to eliminates the generated duplicates, but that's really putting lipstick on a pig. It also eliminates duplicates in the original because its impossible to tell the difference at this point - which is potentially incorrect.

这位前pression a_person.id =任何(eg_assoc.actors)的,但消除重复从结果(在此查询发生两次),除非指定这是不对的。

The expression a_person.id = any(eg_assoc.actors) works, but eliminates duplicates from the result (happens two times in this query), which is wrong unless specified.

原始的数组元素的顺序并不preserved 。这是一般的棘手。但它加剧了该查询,因为演员和恩人相乘,再做出不同的,其中的保证的任意顺序。

Original order of array elements is not preserved. This is tricky in general. But it's aggravated in this query, because actors and benefactors are multiplied and made distinct again, which guarantees arbitrary order.

在外部 SELECT 导致重复的列名,这使得一些客户端失败(在无别名小提琴不工作)没有列的别名。

No column aliases in the outer SELECT result in duplicate column names, which makes some clients fails (not working in the fiddle without aliases).

分钟(演员)分钟(恩人)是无用的。通常我们只会添加列 GROUP BY 而不是假聚集他们。但 eg_assoc.aid 是PK列反正(覆盖整个表 GROUP BY ),所以这甚至没有必要。刚演员,恩人

min(actors) and min(benefactors) are useless. Normally one would just add the columns to GROUP BY instead of fake-aggregating them. But eg_assoc.aid is the PK column anyway (covering the whole table in GROUP BY), so that's not even necessary. Just actors, benefactors.

汇总整个结果是浪费时间和精力来开始。使用不繁殖基地行的查询更聪明,那么你就不必聚集他们回来。

Aggregating the whole result is wasted time and effort to begin with. Use a smarter query that doesn't multiply the base rows, then you don't have to aggregate them back.

这篇关于LEFT OUTER JOIN与多个值阵列列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆