PostgreSQL:分组然后过滤表，条件为不存在 [英] PostgreSQL: Grouping then filtering table, with condition for nonexistence

查看：35 发布时间：2022/1/23 16:32:33 postgresql group-by subquery window-functions distinct-on

本文介绍了PostgreSQL:分组然后过滤表，条件为不存在的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 PostgreSQL 中，我有一个抽象的表，如下所示:

╔═══╦═══╦═══╦═══╗║ A ║ B ║ C ║ D ║╠═══╬═══╬═══╬═══╣║ x ║ 0 ║ y ║ 0 ║║ x ║ 0 ║ x ║ 1 ║║ x ║ 1 ║ y ║ 0 ║║ x ║ 1 ║ z ║ 1 ║║ y ║ 0 ║ z ║ 0 ║║ 是 ║ 0 ║ x ║ 0 ║║ 是 ║ 1 ║ 是 ║ 0 ║╚═══╩═══╩═══╩═══╝

我想在查询中将其转换为:

╔═══╦═══╦══════╗║ A ║ B ║ D ║╠═══╬═══╬══════╣║ x ║ 0 ║ 1 ║║ x ║ 1 ║ 空 ║║ 是 ║ 0 ║ 空 ║║ 是 ║ 1 ║ 0 ║╚═══╩═══╩══════╝

……这样:

输入表的行按 A 和 B 分组，并且
对于每个 A 和 B 对:
- 如果输入表有任何行使得 A = C，则输出表有一行 (A, B, D)，其中 D 来自 A = C 的同一行.
  例如，输入表有一行 (x, 0, x, 1)，其中 A 和 C 都是 x.这意味着输出表有一行 (x, 0, 1)，因为 D 为 1. (x, 0, y, 0) 行(因为它也有 A = x 和 B = 0)被丢弃.
- 否则，如果不存在这样的行，则输出表有一行(A，B，null).
  例如，输入表有两行，其中 A = y 和 B = 0 — 它们是 (y, 0, z, 0) 和 (y, 0, x, 0).在这两行中都没有 A = C.这意味着输出表有一个 (y, 0, null) 行.

我想不出任何方法来使用聚合函数、窗口函数，或子查询来执行这个转换.

解决方案

要从每个组中获取具有相同 (A, B) 的单行，有一个简单、简短且快速的方法:DISTINCT ON - 完全不涉及聚合函数、窗口函数或子查询:

SELECT DISTINCT ON (A, B)A，B，当 A = C 然后 D 以 D 结束时的情况来自 tbl按 A、B、(A = C) DESC 排序；

准确地产生您想要的结果.

db<>小提琴这里p>

假设所有涉及的列都定义了NOT NULL，否则你需要做更多.

最后的 ORDER BY 项 (A = C) DESC 对行进行排序，其中 A = C 在每组的第一个.这是一个boolean 表达式，FALSE 排在TRUE 之前.如果可以有多行，请添加更多 ORDER BY 项以打破平局.

CASE WHEN A = C THEN D END 实现了您的要求，即 D 仅在给定条件下输出.否则我们得到 NULL (CASE)，根据需要.

详细解释:

大表可能会有更多的性能优化:

优化 GROUP BY 查询到检索每个用户的最新行

In PostgreSQL, I have a table that, abstractly, looks like this:

╔═══╦═══╦═══╦═══╗
║ A ║ B ║ C ║ D ║
╠═══╬═══╬═══╬═══╣
║ x ║ 0 ║ y ║ 0 ║
║ x ║ 0 ║ x ║ 1 ║
║ x ║ 1 ║ y ║ 0 ║
║ x ║ 1 ║ z ║ 1 ║
║ y ║ 0 ║ z ║ 0 ║
║ y ║ 0 ║ x ║ 0 ║
║ y ║ 1 ║ y ║ 0 ║
╚═══╩═══╩═══╩═══╝

I want to transform it in a query into this:

╔═══╦═══╦══════╗
║ A ║ B ║  D   ║
╠═══╬═══╬══════╣
║ x ║ 0 ║ 1    ║
║ x ║ 1 ║ null ║
║ y ║ 0 ║ null ║
║ y ║ 1 ║ 0    ║
╚═══╩═══╩══════╝

…such that:

The input table’s rows are grouped by A and B, and
For each A and B pair:
- If the input table has any row such that A = C, then the output table has a row (A, B, D), where D is from the same row in which A = C.
  
  For instance, the input table has a row (x, 0, x, 1), in which both A and C are x. That means the output table has a row (x, 0, 1), because D is 1. The (x, 0, y, 0) row (because it also has A = x and B = 0) is discarded.
- Otherwise, if no such row exists, then the output table has a row (A, B, null).
  
  For instance, the input table has two rows in which A = y and B = 0—they are (y, 0, z, 0) and (y, 0, x, 0). In neither of these rows does A = C. That means that the output table has a (y, 0, null) row.

I can’t figure out any way to use aggregate functions, window functions, or subqueries to perform this transformation.

解决方案

To get a single row from each group with the same (A, B) there is a simple, short and fast way: DISTINCT ON - not involving aggregate functions, window functions, or subqueries at all:

SELECT DISTINCT ON (A, B)
       A, B, CASE WHEN A = C THEN D END AS D
FROM   tbl
ORDER  BY A, B, (A = C) DESC;

Produces your desired result exactly.

db<>fiddle here

Assuming all involved columns are defined NOT NULL, or you need to do more.

The final ORDER BY item (A = C) DESC sorts the row with A = C first per group. It's a boolean expression, and FALSE sorts before TRUE. If there can be multiple rows, add more ORDER BY items to break ties.

CASE WHEN A = C THEN D END implements your requirement that D is only output for the given condition. Else we get NULL (the default for CASE), as desired.

Detailed explanation:

More performance optimization may be possible for big tables:

Optimize GROUP BY query to retrieve latest row per user

这篇关于PostgreSQL:分组然后过滤表，条件为不存在的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PostgreSQL:分组然后过滤表，条件为不存在 [英] PostgreSQL: Grouping then filtering table, with condition for nonexistence

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PostgreSQL:分组然后过滤表，条件为不存在 [英] PostgreSQL: Grouping then filtering table, with condition for nonexistence

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭