PostgreSQL:分组然后过滤表,条件为不存在 [英] PostgreSQL: Grouping then filtering table, with condition for nonexistence

查看:35
本文介绍了PostgreSQL:分组然后过滤表,条件为不存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 PostgreSQL 中,我有一个抽象的表,如下所示:

╔═══╦═══╦═══╦═══╗║ A ║ B ║ C ║ D ║╠═══╬═══╬═══╬═══╣║ x ║ 0 ║ y ║ 0 ║║ x ║ 0 ║ x ║ 1 ║║ x ║ 1 ║ y ║ 0 ║║ x ║ 1 ║ z ║ 1 ║║ y ║ 0 ║ z ║ 0 ║║ 是 ║ 0 ║ x ║ 0 ║║ 是 ║ 1 ║ 是 ║ 0 ║╚═══╩═══╩═══╩═══╝

我想在查询中将其转换为:

╔═══╦═══╦══════╗║ A ║ B ║ D ║╠═══╬═══╬══════╣║ x ║ 0 ║ 1 ║║ x ║ 1 ║ 空 ║║ 是 ║ 0 ║ 空 ║║ 是 ║ 1 ║ 0 ║╚═══╩═══╩══════╝

……这样:

  1. 输入表的行按 A 和 B 分组,并且
  2. 对于每个 A 和 B 对:

    • 如果输入表有任何行使得 A = C,则输出表有一行 (A, B, D),其中 D 来自 A = C 的同一行.

      例如,输入表有一行 (x, 0, x, 1),其中 A 和 C 都是 x.这意味着输出表有一行 (x, 0, 1),因为 D 为 1. (x, 0, y, 0) 行(因为它也有 A = x 和 B = 0)被丢弃.

    • 否则,如果不存在这样的行,则输出表有一行(A,B,null).

      例如,输入表有两行,其中 A = y 和 B = 0 — 它们是 (y, 0, z, 0) 和 (y, 0, x, 0).在这两行中都没有 A = C.这意味着输出表有一个 (y, 0, null) 行.

我想不出任何方法来使用 聚合函数窗口函数,或 子查询来执行这个转换.

解决方案

要从每个组中获取具有相同 (A, B) 的单行,有一个简单、简短且快速的方法:DISTINCT ON - 完全不涉及聚合函数、窗口函数或子查询:

SELECT DISTINCT ON (A, B)A,B,当 A = C 然后 D 以 D 结束时的情况来自 tbl按 A、B、(A = C) DESC 排序;

准确地产生您想要的结果.

db<>小提琴这里p>

假设所有涉及的列都定义了NOT NULL,否则你需要做更多.

最后的 ORDER BY(A = C) DESC 对行进行排序,其中 A = C 在每组的第一个.这是一个boolean 表达式,FALSE 排在TRUE 之前.如果可以有多行,请添加更多 ORDER BY 项以打破平局.

CASE WHEN A = C THEN D END 实现了您的要求,即 D 仅在给定条件下输出.否则我们得到 NULL (CASE),根据需要.

详细解释:

大表可能会有更多的性能优化:

In PostgreSQL, I have a table that, abstractly, looks like this:

╔═══╦═══╦═══╦═══╗
║ A ║ B ║ C ║ D ║
╠═══╬═══╬═══╬═══╣
║ x ║ 0 ║ y ║ 0 ║
║ x ║ 0 ║ x ║ 1 ║
║ x ║ 1 ║ y ║ 0 ║
║ x ║ 1 ║ z ║ 1 ║
║ y ║ 0 ║ z ║ 0 ║
║ y ║ 0 ║ x ║ 0 ║
║ y ║ 1 ║ y ║ 0 ║
╚═══╩═══╩═══╩═══╝

I want to transform it in a query into this:

╔═══╦═══╦══════╗
║ A ║ B ║  D   ║
╠═══╬═══╬══════╣
║ x ║ 0 ║ 1    ║
║ x ║ 1 ║ null ║
║ y ║ 0 ║ null ║
║ y ║ 1 ║ 0    ║
╚═══╩═══╩══════╝

…such that:

  1. The input table’s rows are grouped by A and B, and
  2. For each A and B pair:

    • If the input table has any row such that A = C, then the output table has a row (A, B, D), where D is from the same row in which A = C.

      For instance, the input table has a row (x, 0, x, 1), in which both A and C are x. That means the output table has a row (x, 0, 1), because D is 1. The (x, 0, y, 0) row (because it also has A = x and B = 0) is discarded.

    • Otherwise, if no such row exists, then the output table has a row (A, B, null).

      For instance, the input table has two rows in which A = y and B = 0—they are (y, 0, z, 0) and (y, 0, x, 0). In neither of these rows does A = C. That means that the output table has a (y, 0, null) row.

I can’t figure out any way to use aggregate functions, window functions, or subqueries to perform this transformation.

解决方案

To get a single row from each group with the same (A, B) there is a simple, short and fast way: DISTINCT ON - not involving aggregate functions, window functions, or subqueries at all:

SELECT DISTINCT ON (A, B)
       A, B, CASE WHEN A = C THEN D END AS D
FROM   tbl
ORDER  BY A, B, (A = C) DESC;

Produces your desired result exactly.

db<>fiddle here

Assuming all involved columns are defined NOT NULL, or you need to do more.

The final ORDER BY item (A = C) DESC sorts the row with A = C first per group. It's a boolean expression, and FALSE sorts before TRUE. If there can be multiple rows, add more ORDER BY items to break ties.

CASE WHEN A = C THEN D END implements your requirement that D is only output for the given condition. Else we get NULL (the default for CASE), as desired.

Detailed explanation:

More performance optimization may be possible for big tables:

这篇关于PostgreSQL:分组然后过滤表,条件为不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆