PostgreSQL:分组然后过滤表,条件为不存在 [英] PostgreSQL: Grouping then filtering table, with condition for nonexistence
问题描述
在 PostgreSQL 中,我有一个抽象的表,如下所示:
╔═══╦═══╦═══╦═══╗║ A ║ B ║ C ║ D ║╠═══╬═══╬═══╬═══╣║ x ║ 0 ║ y ║ 0 ║║ x ║ 0 ║ x ║ 1 ║║ x ║ 1 ║ y ║ 0 ║║ x ║ 1 ║ z ║ 1 ║║ y ║ 0 ║ z ║ 0 ║║ 是 ║ 0 ║ x ║ 0 ║║ 是 ║ 1 ║ 是 ║ 0 ║╚═══╩═══╩═══╩═══╝
我想在查询中将其转换为:
╔═══╦═══╦══════╗║ A ║ B ║ D ║╠═══╬═══╬══════╣║ x ║ 0 ║ 1 ║║ x ║ 1 ║ 空 ║║ 是 ║ 0 ║ 空 ║║ 是 ║ 1 ║ 0 ║╚═══╩═══╩══════╝
……这样:
- 输入表的行按 A 和 B 分组,并且
对于每个 A 和 B 对:
如果输入表有任何行使得 A = C,则输出表有一行 (A, B, D),其中 D 来自 A = C 的同一行.
例如,输入表有一行 (x, 0, x, 1),其中 A 和 C 都是 x.这意味着输出表有一行 (x, 0, 1),因为 D 为 1. (x, 0, y, 0) 行(因为它也有 A = x 和 B = 0)被丢弃.
否则,如果不存在这样的行,则输出表有一行(A,B,null).
例如,输入表有两行,其中 A = y 和 B = 0 — 它们是 (y, 0, z, 0) 和 (y, 0, x, 0).在这两行中都没有 A = C.这意味着输出表有一个 (y, 0, null) 行.
我想不出任何方法来使用 聚合函数、窗口函数,或 子查询来执行这个转换.
要从每个组中获取具有相同 (A, B)
的单行,有一个简单、简短且快速的方法:DISTINCT ON
- 完全不涉及聚合函数、窗口函数或子查询:
SELECT DISTINCT ON (A, B)A,B,当 A = C 然后 D 以 D 结束时的情况来自 tbl按 A、B、(A = C) DESC 排序;
准确地产生您想要的结果.
db<>小提琴这里p>
假设所有涉及的列都定义了NOT NULL
,否则你需要做更多.
最后的 ORDER BY
项 (A = C) DESC
对行进行排序,其中 A = C
在每组的第一个.这是一个boolean
表达式,FALSE
排在TRUE
之前.如果可以有多行,请添加更多 ORDER BY
项以打破平局.
CASE WHEN A = C THEN D END
实现了您的要求,即 D
仅在给定条件下输出.否则我们得到 NULL
(CASE
),根据需要.
详细解释:
大表可能会有更多的性能优化:
In PostgreSQL, I have a table that, abstractly, looks like this:
╔═══╦═══╦═══╦═══╗
║ A ║ B ║ C ║ D ║
╠═══╬═══╬═══╬═══╣
║ x ║ 0 ║ y ║ 0 ║
║ x ║ 0 ║ x ║ 1 ║
║ x ║ 1 ║ y ║ 0 ║
║ x ║ 1 ║ z ║ 1 ║
║ y ║ 0 ║ z ║ 0 ║
║ y ║ 0 ║ x ║ 0 ║
║ y ║ 1 ║ y ║ 0 ║
╚═══╩═══╩═══╩═══╝
I want to transform it in a query into this:
╔═══╦═══╦══════╗
║ A ║ B ║ D ║
╠═══╬═══╬══════╣
║ x ║ 0 ║ 1 ║
║ x ║ 1 ║ null ║
║ y ║ 0 ║ null ║
║ y ║ 1 ║ 0 ║
╚═══╩═══╩══════╝
…such that:
- The input table’s rows are grouped by A and B, and
For each A and B pair:
If the input table has any row such that A = C, then the output table has a row (A, B, D), where D is from the same row in which A = C.
For instance, the input table has a row (x, 0, x, 1), in which both A and C are x. That means the output table has a row (x, 0, 1), because D is 1. The (x, 0, y, 0) row (because it also has A = x and B = 0) is discarded.
Otherwise, if no such row exists, then the output table has a row (A, B, null).
For instance, the input table has two rows in which A = y and B = 0—they are (y, 0, z, 0) and (y, 0, x, 0). In neither of these rows does A = C. That means that the output table has a (y, 0, null) row.
I can’t figure out any way to use aggregate functions, window functions, or subqueries to perform this transformation.
To get a single row from each group with the same (A, B)
there is a simple, short and fast way: DISTINCT ON
- not involving aggregate functions, window functions, or subqueries at all:
SELECT DISTINCT ON (A, B)
A, B, CASE WHEN A = C THEN D END AS D
FROM tbl
ORDER BY A, B, (A = C) DESC;
Produces your desired result exactly.
db<>fiddle here
Assuming all involved columns are defined NOT NULL
, or you need to do more.
The final ORDER BY
item (A = C) DESC
sorts the row with A = C
first per group. It's a boolean
expression, and FALSE
sorts before TRUE
. If there can be multiple rows, add more ORDER BY
items to break ties.
CASE WHEN A = C THEN D END
implements your requirement that D
is only output for the given condition. Else we get NULL
(the default for CASE
), as desired.
Detailed explanation:
More performance optimization may be possible for big tables:
这篇关于PostgreSQL:分组然后过滤表,条件为不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!