如何在PostgreSQL查询中排序不同的元组 [英] How to order distinct tuples in a PostgreSQL query

查看:121
本文介绍了如何在PostgreSQL查询中排序不同的元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Postgres中提交一个查询,该查询仅返回不同的元组。在我的示例查询中,我不希望重复的条目,因为一个cluster_id / feed_id组合的条目存在多次。如果我做一个简单的例子:

I'm trying to submit a query in Postgres that only returns distinct tuples. In my sample query, I do not want duplicate entries where an entry exists multiple times for a cluster_id/feed_id combination. If I do a simple:

select distinct on (cluster_info.cluster_id, feed_id) 
   cluster_info.cluster_id, num_docs, feed_id, url_time 
   from url_info 
   join cluster_info on (cluster_info.cluster_id = url_info.cluster_id) 
   where feed_id in (select pot_seeder from potentials) 
   and num_docs > 5 and url_time > '2012-04-16';

我明白了,但我也想根据分组num_docs 。因此,当我执行以下操作时:

I get just that, but I'd also like to group according to num_docs. So, when I do the following:

select distinct on (cluster_info.cluster_id, feed_id) 
   cluster_info.cluster_id, num_docs, feed_id, url_time 
   from url_info join cluster_info 
   on (cluster_info.cluster_id = url_info.cluster_id) 
   where feed_id in (select pot_seeder from potentials) 
   and num_docs > 5 and url_time > '2012-04-16' 
   order by num_docs desc;

我收到以下错误:

ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: select distinct on (cluster_info.cluster_id, feed_id) cluste...

我想我理解我为什么会收到错误(除非我以某种方式明确描述了该组,否则无法按元组进行分组),但是我该怎么办那?还是如果我对错误的解释不正确,是否有办法实现我的最初目标?

I think I understand why I'm getting the error (cannot group by tuples unless I explicitly describe the group somehow) but how do I do that? Or if I am incorrect in my interpretation of the error, is there a way to accomplish my initial goal?

推荐答案

最左侧的 ORDER BY 项不能与 DISTINCT 子句的项不同。我引用关于 DISTINCT的手册

The leftmost ORDER BY items cannot disagree with the items of the DISTINCT clause. I quote the manual about DISTINCT:


DISTINCT ON 表达式必须与最左边的 ORDER BY
表达式匹配。 ORDER BY子句通常会包含其他
表达式,这些表达式确定
中每个 DISTINCT ON 组中行的期望优先级。

The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.

尝试:

SELECT *
FROM  (
    SELECT DISTINCT ON (c.cluster_id, feed_id) 
           c.cluster_id, num_docs, feed_id, url_time 
    FROM   url_info u
    JOIN   cluster_info c ON (c.cluster_id = u.cluster_id) 
    WHERE  feed_id IN (SELECT pot_seeder FROM potentials) 
    AND    num_docs > 5
    AND    url_time > '2012-04-16'
    ORDER  BY c.cluster_id, feed_id, num_docs, url_time
           -- first columns match DISTINCT
           -- the rest to pick certain values for dupes
           -- or did you want to pick random values for dupes?
    ) x
ORDER  BY num_docs DESC;

或使用 GROUP BY

SELECT c.cluster_id
     , num_docs
     , feed_id
     , url_time 
FROM   url_info u
JOIN   cluster_info c ON (c.cluster_id = u.cluster_id) 
WHERE  feed_id IN (SELECT pot_seeder FROM potentials) 
AND    num_docs > 5
AND    url_time > '2012-04-16'
GROUP  BY c.cluster_id, feed_id 
ORDER  BY num_docs DESC;

如果 c.cluster_id,则feed_id 是您在 SELECT 列表中包含的所有表的所有主键列(在本例中均为表),则仅适用于PostgreSQL 9.1

If c.cluster_id, feed_id are the primary key columns of all (both in this case) tables that you include columns from in the SELECT list, then this just works with PostgreSQL 9.1 or later.

否则,您需要其余各列 GROUP BY 或进行汇总或提供更多信息。

Else you need to GROUP BY the rest of the columns or aggregate or provide more information.

这篇关于如何在PostgreSQL查询中排序不同的元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆