PostgreSQL:分组然后过滤表,条件不存在 [英] PostgreSQL: Grouping then filtering table, with condition for nonexistence
问题描述
在PostgreSQL中,我有一个抽象的表,如下所示:
In PostgreSQL, I have a table that, abstractly, looks like this:
╔═══╦═══╦═══╦═══╗
║ A ║ B ║ C ║ D ║
╠═══╬═══╬═══╬═══╣
║ x ║ 0 ║ y ║ 0 ║
║ x ║ 0 ║ x ║ 1 ║
║ x ║ 1 ║ y ║ 0 ║
║ x ║ 1 ║ z ║ 1 ║
║ y ║ 0 ║ z ║ 0 ║
║ y ║ 0 ║ x ║ 0 ║
║ y ║ 1 ║ y ║ 0 ║
╚═══╩═══╩═══╩═══╝
我想在查询中将其转换为:
I want to transform it in a query into this:
╔═══╦═══╦══════╗
║ A ║ B ║ D ║
╠═══╬═══╬══════╣
║ x ║ 0 ║ 1 ║
║ x ║ 1 ║ null ║
║ y ║ 0 ║ null ║
║ y ║ 1 ║ 0 ║
╚═══╩═══╩══════╝
...这样:
- 输入表的行按A和B分组,并且
-
对于每个A和B对:
- The input table’s rows are grouped by A and B, and
For each A and B pair:
-
如果输入表中有任何行这样,A = C,则输出表具有一行(A,B,D),其中D来自A = C的同一行。
If the input table has any row such that A = C, then the output table has a row (A, B, D), where D is from the same row in which A = C.
例如,输入表e具有一行(x,0,x,1),其中A和C均为x。这意味着输出表具有一行(x,0,1),因为D为1。(x,0,y,0)行(因为它也具有A = x和B = 0)被丢弃了。
For instance, the input table has a row (x, 0, x, 1), in which both A and C are x. That means the output table has a row (x, 0, 1), because D is 1. The (x, 0, y, 0) row (because it also has A = x and B = 0) is discarded.
否则,如果不存在这样的行,则输出表中将包含一行(A,B,null)。
Otherwise, if no such row exists, then the output table has a row (A, B, null).
例如,输入表有两行,其中A = y和B = 0-它们分别是(y,0,z,0)和(y,0,x,0)。在这两个行中都没有A =C。这意味着输出表中有一个(y,0,null)行。
For instance, the input table has two rows in which A = y and B = 0—they are (y, 0, z, 0) and (y, 0, x, 0). In neither of these rows does A = C. That means that the output table has a (y, 0, null) row.
我不知道有什么方法可以使用集合函数,窗口函数或子查询来执行此转换。
I can’t figure out any way to use aggregate functions, window functions, or subqueries to perform this transformation.
推荐答案
从每个组中获得具有相同(A,B)
有一种简单,快捷的方法: DISTINCT ON
-完全不涉及聚合函数,窗口函数或子查询:
To get a single row from each group with the same (A, B)
there is a simple, short and fast way: DISTINCT ON
- not involving aggregate functions, window functions, or subqueries at all:
SELECT DISTINCT ON (A, B)
A, B, CASE WHEN A = C THEN D END AS D
FROM tbl
ORDER BY A, B, (A = C) DESC;
准确地产生您想要的结果。
Produces your desired result exactly.
db<>小提琴此处
假定所有涉及的列都已定义为 NOT NULL
,或者您需要做更多的事情。
Assuming all involved columns are defined NOT NULL
, or you need to do more.
最后一个 ORDER BY
项(A = C)DESC
用 A = C
。它是一个布尔值
表达式,并且 FALSE
在 TRUE
之前排序。如果可以有多行,则添加更多 ORDER BY
项目以打破平局。
The final ORDER BY
item (A = C) DESC
sorts the row with A = C
first per group. It's a boolean
expression, and FALSE
sorts before TRUE
. If there can be multiple rows, add more ORDER BY
items to break ties.
在A = C THEN D END
的情况下实现您的要求,即仅在给定条件下输出 D
。否则,我们会得到 NULL
( Case
),根据需要。
CASE WHEN A = C THEN D END
implements your requirement that D
is only output for the given condition. Else we get NULL
(the default for CASE
), as desired.
详细说明:
- Select first row in each GROUP BY group?
- Sorting null values after all others, except special
大表可能会进行更多性能优化:
More performance optimization may be possible for big tables:
- Optimize GROUP BY query to retrieve latest row per user
这篇关于PostgreSQL:分组然后过滤表,条件不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!