消除PostgreSQL SELECT语句中的重复行 [英] Eliminate duplicate rows in a PostgreSQL SELECT statement

查看:409
本文介绍了消除PostgreSQL SELECT语句中的重复行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的查询:

SELECT autor.entwickler,anwendung.name
  FROM autor 
  left join anwendung
    on anwendung.name = autor.anwendung;

 entwickler |    name     
------------+-------------
 Benutzer 1 | Anwendung 1
 Benutzer 2 | Anwendung 1
 Benutzer 2 | Anwendung 2
 Benutzer 1 | Anwendung 3
 Benutzer 1 | Anwendung 4
 Benutzer 2 | Anwendung 4
(6 rows)

我想为每个不同的值保留一行字段名称,并舍弃其他人:

I want to keep one row for each distinct value in the field name, and discard the others like this:

 entwickler |    name     
------------+-------------
 Benutzer 1 | Anwendung 1
 Benutzer 2 | Anwendung 2
 Benutzer 1 | Anwendung 3
 Benutzer 1 | Anwendung 4

在MySQL中我只会做:

In MySQL I would just do:

SELECT autor.entwickler,anwendung.name
  FROM autor
  left join anwendung
    on anwendung.name = autor.anwendung
 GROUP BY anwendung.name;

但是PostgreSQL给我这个错误:

But PostgreSQL gives me this error:


错误:列autor.entwickler必须出现在GROUP BY子句
中或用于聚合函数中LINE 1:SELECT autor.entwickler
FROM autor left join anwendung在...上

ERROR: column "autor.entwickler" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT autor.entwickler FROM autor left join anwendung on an ...

我完全理解错误,并假设mysql实现比postgres实现少SQL。但是如何获得所需的结果?

I totally understand the error and assume that the mysql implementation is less SQL conform than the postgres implementation. But how can I get the desired result?

推荐答案

PostgreSQL当前不允许含糊的 GROUP BY 结果取决于表扫描顺序,使用的计划等。这是标准说它应该工作的AFAIK,但一些数据库(如5.7之前的MySQL版本)允许更宽松的查询只需选择出现在 SELECT 列表中但不在 GROUP BY 中的元素遇到的第一个值。

PostgreSQL doesn't currently allow ambiguous GROUP BY statements where the results are dependent on the order the table is scanned, the plan used, etc. That's how the standard says it should work AFAIK, but some databases (like MySQL versions prior to 5.7) permit looser queries that just pick the first value encountered for elements appearing in the SELECT list but not in GROUP BY.

在PostgreSQL中,您应该使用 DISTINCT ON 用于这种查询。

In PostgreSQL, you should use DISTINCT ON for this kind of query.

您想要写如下:

SELECT DISTINCT ON (anwendung.name) anwendung.name, autor.entwickler
FROM author 
left join anwendung on anwendung.name = autor.anwendung;

(根据后续评论更正语法)

(Syntax corrected based on follow-up comment)

这有点像$ 5.7 $ ANY_VALUE(...) group的伪函数,但是相反,它表示子句中不同的值必须是唯一的,并且任何值对于指定的的列是可以接受的。

This is a bit like MySQL 5.7's ANY_VALUE(...) pseudo-function for group by, but in reverse - it says that the values in the distinct on clause must be unique, and any value is acceptable for the columns not specified.

除非有一个 ORDER BY ,否则没有选择哪些值。您通常应该有一个 ORDER BY 来实现可预测性。

Unless there's an ORDER BY, there is no gurantee as to which values are selected. You should usually have an ORDER BY for predictability.

还注意到使用像 min() max()将工作。虽然这是真的 - 并且将导致可靠和可预测的结果,不同于使用 DISTINCT ON 或一个含糊的 GROUP BY 它由于需要额外的排序或聚合而具有性能成本,并且仅适用于顺序数据类型。

It's also been noted that using an aggregate like min() or max() would work. While this is true - and will lead to reliable and predictable results, unlike using DISTINCT ON or an ambigious GROUP BY - it has a performance cost due to the need for extra sorting or aggregation, and it only works for ordinal data types.

这篇关于消除PostgreSQL SELECT语句中的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆