PostgreSQL:分组然后过滤表,条件不存在 [英] PostgreSQL: Grouping then filtering table, with condition for nonexistence

查看:260
本文介绍了PostgreSQL:分组然后过滤表,条件不存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在PostgreSQL中,我有一个抽象的表,如下所示:

In PostgreSQL, I have a table that, abstractly, looks like this:

╔═══╦═══╦═══╦═══╗
║ A ║ B ║ C ║ D ║
╠═══╬═══╬═══╬═══╣
║ x ║ 0 ║ y ║ 0 ║
║ x ║ 0 ║ x ║ 1 ║
║ x ║ 1 ║ y ║ 0 ║
║ x ║ 1 ║ z ║ 1 ║
║ y ║ 0 ║ z ║ 0 ║
║ y ║ 0 ║ x ║ 0 ║
║ y ║ 1 ║ y ║ 0 ║
╚═══╩═══╩═══╩═══╝

我想在查询中将其转换为:

I want to transform it in a query into this:

╔═══╦═══╦══════╗
║ A ║ B ║  D   ║
╠═══╬═══╬══════╣
║ x ║ 0 ║ 1    ║
║ x ║ 1 ║ null ║
║ y ║ 0 ║ null ║
║ y ║ 1 ║ 0    ║
╚═══╩═══╩══════╝

...这样:


  1. 输入表的行按A和B分组,并且

  2. 对于每个A和B对:

  1. The input table’s rows are grouped by A and B, and
  2. For each A and B pair:


  • 如果输入表中有任何行这样,A = C,则输出表具有一行(A,B,D),其中D来自A = C的同一行。

  • If the input table has any row such that A = C, then the output table has a row (A, B, D), where D is from the same row in which A = C.

例如,输入表e具有一行(x,0,x,1),其中A和C均为x。这意味着输出表具有一行(x,0,1),因为D为1。(x,0,y,0)行(因为它也具有A = x和B = 0)被丢弃了。

For instance, the input table has a row (x, 0, x, 1), in which both A and C are x. That means the output table has a row (x, 0, 1), because D is 1. The (x, 0, y, 0) row (because it also has A = x and B = 0) is discarded.

否则,如果不存在这样的行,则输出表中将包含一行(A,B,null)。

Otherwise, if no such row exists, then the output table has a row (A, B, null).

例如,输入表有两行,其中A = y和B = 0-它们分别是(y,0,z,0)和(y,0,x,0)。在这两个行中都没有A =C。这意味着输出表中有一个(y,0,null)行。

For instance, the input table has two rows in which A = y and B = 0—they are (y, 0, z, 0) and (y, 0, x, 0). In neither of these rows does A = C. That means that the output table has a (y, 0, null) row.

我不知道有什么方法可以使用集合函数窗口函数子查询来执行此转换。

I can’t figure out any way to use aggregate functions, window functions, or subqueries to perform this transformation.

推荐答案

从每个组中获得具有相同(A,B)有一种简单,快捷的方法: DISTINCT ON -完全不涉及聚合函数,窗口函数或子查询:

To get a single row from each group with the same (A, B) there is a simple, short and fast way: DISTINCT ON - not involving aggregate functions, window functions, or subqueries at all:

SELECT DISTINCT ON (A, B)
       A, B, CASE WHEN A = C THEN D END AS D
FROM   tbl
ORDER  BY A, B, (A = C) DESC;

准确地产生您想要的结果。

Produces your desired result exactly.

db<>小提琴此处

假定所有涉及的列都已定义为 NOT NULL ,或者您需要做更多的事情。

Assuming all involved columns are defined NOT NULL, or you need to do more.

最后一个 ORDER BY (A = C)DESC A = C 。它是一个布尔值表达式,并且 FALSE TRUE 之前排序。如果可以有多行,则添加更多 ORDER BY 项目以打破平局。

The final ORDER BY item (A = C) DESC sorts the row with A = C first per group. It's a boolean expression, and FALSE sorts before TRUE. If there can be multiple rows, add more ORDER BY items to break ties.

在A = C THEN D END 的情况下实现您的要求,即仅在给定条件下输出 D 。否则,我们会得到 NULL Case ),根据需要。

CASE WHEN A = C THEN D END implements your requirement that D is only output for the given condition. Else we get NULL (the default for CASE), as desired.

详细说明:

  • Select first row in each GROUP BY group?
  • Sorting null values after all others, except special

大表可能会进行更多性能优化:

More performance optimization may be possible for big tables:

  • Optimize GROUP BY query to retrieve latest row per user

这篇关于PostgreSQL:分组然后过滤表,条件不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆