消除PostgreSQL SELECT语句中的重复行 [英] Eliminate duplicate rows in a PostgreSQL SELECT statement
问题描述
这是我的查询:
SELECT autor.entwickler,anwendung.name
FROM autor
left join anwendung
on anwendung.name = autor.anwendung;
entwickler | name
------------+-------------
Benutzer 1 | Anwendung 1
Benutzer 2 | Anwendung 1
Benutzer 2 | Anwendung 2
Benutzer 1 | Anwendung 3
Benutzer 1 | Anwendung 4
Benutzer 2 | Anwendung 4
(6 rows)
我想为每个不同的值保留一行字段名称
,并舍弃其他人:
I want to keep one row for each distinct value in the field name
, and discard the others like this:
entwickler | name
------------+-------------
Benutzer 1 | Anwendung 1
Benutzer 2 | Anwendung 2
Benutzer 1 | Anwendung 3
Benutzer 1 | Anwendung 4
在MySQL中我只会做:
In MySQL I would just do:
SELECT autor.entwickler,anwendung.name
FROM autor
left join anwendung
on anwendung.name = autor.anwendung
GROUP BY anwendung.name;
但是PostgreSQL给我这个错误:
But PostgreSQL gives me this error:
错误:列autor.entwickler必须出现在GROUP BY子句
中或用于聚合函数中LINE 1:SELECT autor.entwickler
FROM autor left join anwendung在...上
ERROR: column "autor.entwickler" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT autor.entwickler FROM autor left join anwendung on an ...
我完全理解错误,并假设mysql实现比postgres实现少SQL。但是如何获得所需的结果?
I totally understand the error and assume that the mysql implementation is less SQL conform than the postgres implementation. But how can I get the desired result?
推荐答案
PostgreSQL当前不允许含糊的 GROUP BY
结果取决于表扫描顺序,使用的计划等。这是标准说它应该工作的AFAIK,但一些数据库(如5.7之前的MySQL版本)允许更宽松的查询只需选择出现在 SELECT
列表中但不在 GROUP BY
中的元素遇到的第一个值。
PostgreSQL doesn't currently allow ambiguous GROUP BY
statements where the results are dependent on the order the table is scanned, the plan used, etc. That's how the standard says it should work AFAIK, but some databases (like MySQL versions prior to 5.7) permit looser queries that just pick the first value encountered for elements appearing in the SELECT
list but not in GROUP BY
.
在PostgreSQL中,您应该使用 DISTINCT ON
用于这种查询。
In PostgreSQL, you should use DISTINCT ON
for this kind of query.
您想要写如下:
SELECT DISTINCT ON (anwendung.name) anwendung.name, autor.entwickler
FROM author
left join anwendung on anwendung.name = autor.anwendung;
(根据后续评论更正语法)
(Syntax corrected based on follow-up comment)
这有点像$ 5.7 $ ANY_VALUE(...)
group的伪函数
,但是相反,它表示子句中不同的值必须是唯一的,并且任何值对于指定的不的列是可以接受的。
This is a bit like MySQL 5.7's ANY_VALUE(...)
pseudo-function for group by
, but in reverse - it says that the values in the distinct on
clause must be unique, and any value is acceptable for the columns not specified.
除非有一个 ORDER BY
,否则没有选择哪些值。您通常应该有一个 ORDER BY
来实现可预测性。
Unless there's an ORDER BY
, there is no gurantee as to which values are selected. You should usually have an ORDER BY
for predictability.
还注意到使用像 min()
或 max()
将工作。虽然这是真的 - 并且将导致可靠和可预测的结果,不同于使用 DISTINCT ON
或一个含糊的 GROUP BY
它由于需要额外的排序或聚合而具有性能成本,并且仅适用于顺序数据类型。
It's also been noted that using an aggregate like min()
or max()
would work. While this is true - and will lead to reliable and predictable results, unlike using DISTINCT ON
or an ambigious GROUP BY
- it has a performance cost due to the need for extra sorting or aggregation, and it only works for ordinal data types.
这篇关于消除PostgreSQL SELECT语句中的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!