为什么根据大小做这些联接不同? [英] Why do these join differently based on size?

查看:126
本文介绍了为什么根据大小做这些联接不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在PostgreSQL中,如果UNNEST相同大小的两个阵列,它们从一个阵列排队每个值与一个从其他的,但是如果两个数组是不一样的大小,它从一个连接的每个值与每一个值从另一

In Postgresql, if you unnest two arrays of the same size, they line up each value from one array with one from the other, but if the two arrays are not the same size, it joins each value from one with every value from the other.

select unnest(ARRAY[1, 2, 3, 4, 5]::bigint[]) as id,
unnest(ARRAY['a', 'b', 'c', 'd', 'e']) as value

将返回

1 | "a"
2 | "b"
3 | "c"
4 | "d"
5 | "e"

select unnest(ARRAY[1, 2, 3, 4, 5]::bigint[]) as id, -- 5 elements
unnest(ARRAY['a', 'b', 'c', 'd']) as value -- 4 elements
order by id

将返回

1 | "a"
1 | "b"
1 | "c"
1 | "d"
2 | "b"
2 | "a"
2 | "c"
2 | "d"
3 | "b"
3 | "d"
3 | "a"
3 | "c"
4 | "d"
4 | "a"
4 | "c"
4 | "b"
5 | "d"
5 | "c"
5 | "b"
5 | "a"

这是为什么?我假设正在使用某种隐含规则,我想知道如果我可以明确地做到这一点(例如,如果我想第二个样式时,我有匹配的数组大小,或者如果我想在一个阵列中的缺失值是视为NULL)。

Why is this? I assume some sort of implicit rule is being used, and I'd like to know if I can do it explicitly (eg if I want the second style when I have matching array sizes, or if I want missing values in one array to be treated as NULL).

推荐答案

支持集返回功能是PostgreSQL的扩展,以及国际海事组织非常奇怪的。它广泛地认为德precated和最好避免在可能的情况。

Support for set-returning functions in SELECT is a PostgreSQL extension, and an IMO very weird one. It's broadly considered deprecated and best avoided where possible.

现在横向 9.3的支持,两个主要用途之一已经一去不复返了。它曾经是必要使用选择一组,返回函数如果你想使用一个SRF的输出作为输入到另一个;即不再需要用横向

Now that LATERAL is supported in 9.3, one of the two main uses is gone. It used to be necessary to use a set-returning function in SELECT if you wanted to use the output of one SRF as the input to another; that is no longer needed with LATERAL.

另外的使用将在9.4,当与序数添加更换,让您preserve一套-返回函数的输出顺序。这是目前主要剩余用途:做事情像两个战略成果框架的输出压缩成匹配值对的行集。 与序数是最令人期待的 UNNEST ,但与其他任何SRF工作。

The other use will be replaced in 9.4, when WITH ORDINALITY is added, allowing you to preserve the output ordering of a set-returning function. That's currently the main remaining use: to do things like zip the output of two SRFs into a rowset of matched value pairs. WITH ORDINALITY is most anticipated for unnest, but works with any other SRF.

这是PostgreSQL使用此处(无论何种IMO疯狂的原因,它是在古老的历史最初引入)的逻辑是:无论何时无论是功能产生输出,发出一行。如果只有一个功能已产生输出,再次扫描的另一个的输出以获得所需的行。如果没有产生输出,停止发光的行。

The logic that PostgreSQL is using here (for whatever IMO insane reason it was originally introduced in ancient history) is: whenever either function produces output, emit a row. If only one function has produced output, scan the other one's output again to get the rows required. If neither produces output, stop emitting rows.

它更容易看到 generate_series

regress=> SELECT generate_series(1,2), generate_series(1,2);
 generate_series | generate_series 
-----------------+-----------------
               1 |               1
               2 |               2
(2 rows)

regress=> SELECT generate_series(1,2), generate_series(1,3);
 generate_series | generate_series 
-----------------+-----------------
               1 |               1
               2 |               2
               1 |               3
               2 |               1
               1 |               2
               2 |               3
(6 rows)

regress=> SELECT generate_series(1,2), generate_series(1,4);
 generate_series | generate_series 
-----------------+-----------------
               1 |               1
               2 |               2
               1 |               3
               2 |               4
(4 rows)

在大多数情况下,你真正想要的是两个,这是很多理智的简单交叉连接。

In the majority of cases what you really want is a simple cross join of the two, which is a lot saner.

regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,2) b;
 a | b 
---+---
 1 | 1
 1 | 2
 2 | 1
 2 | 2
(4 rows)

regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,3) b;
 a | b 
---+---
 1 | 1
 1 | 2
 1 | 3
 2 | 1
 2 | 2
 2 | 3
(6 rows)

regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,4) b;
 a | b 
---+---
 1 | 1
 1 | 2
 1 | 3
 1 | 4
 2 | 1
 2 | 2
 2 | 3
 2 | 4
(8 rows)

主要的例外目前是当你要在锁步运行多个功能,成对(如拉链),您目前不能做联接。

The main exception is currently for when you want to run multiple functions in lock-step, pairwise (like a zip), which you cannot currently do with joins.

这将在9.4 与序数提高,广告虽然这将是一个有点比多SRF低效率的扫描SELECT(除非优化改进加)这将是一个很多理智的。

This will be improved in 9.4 with WITH ORDINALITY, a d while it'll be a bit less efficient than a multiple SRF scan in SELECT (unless optimizer improvements are added) it'll be a lot saner.

假设你想配对 1..3 10..40 用null超额元素。使用与序数这会是(PostgreSQL的9.4只):

Say you wanted to pair up 1..3 and 10..40 with nulls for excess elements. Using with ordinality that'd be (PostgreSQL 9.4 only):

regress=# SELECT aval, bval 
           FROM generate_series(1,3) WITH ORDINALITY a(aval,apos) 
           RIGHT OUTER JOIN generate_series(1,4) WITH ORDINALITY b(bval, bpos) 
           ON (apos=bpos);

 aval | bval 
------+------
    1 |    1
    2 |    2
    3 |    3
      |    4
(4 rows)

wheras的SRF-在从将改为返回:

wheras the srf-in-from would instead return:

regress=# SELECT generate_series(1,3) aval, generate_series(1,4) bval;
 aval | bval 
------+------
    1 |    1
    2 |    2
    3 |    3
    1 |    4
    2 |    1
    3 |    2
    1 |    3
    2 |    4
    3 |    1
    1 |    2
    2 |    3
    3 |    4
(12 rows)

这篇关于为什么根据大小做这些联接不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆