在使用 PostgreSQL 的选择查询中删除基于字段的重复行? [英] Remove duplicate rows based on field in a select query with PostgreSQL?

查看:16
本文介绍了在使用 PostgreSQL 的选择查询中删除基于字段的重复行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑包含以下字段的表 mdl_files:idcontenthashtimecreated文件大小.

Considering the table mdl_files that contains the following fields: id, contenthash, timecreated, filesize.

此表存储附件文件.

我们认为具有相同内容哈希的所有行都是重复行,我只想保留最旧的行(如果日期相等,则保留第一行).我该怎么做?

We consider that all the rows with the same content hash are duplicate rows and I just want to keep the oldest row (or first if dates are equals). How can I do that?

以下查询:

SELECT
  id,
  contenthash,
  filesize,
  to_timestamp(timecreated) :: DATE
FROM mdl_files
ORDER BY contenthash;

返回:

2480229 00002e87605311feb82b70473b61e81f0223c774    18178   2016-10-05
2997411 0000bfd20ef84948eee6811ce5bbac03de42ccb0    1293    2017-03-31
1304839 000280169fc78d704a2d4569bfb6f42ea4a1d5ae    8203    2015-11-10
1364656 000280169fc78d704a2d4569bfb6f42ea4a1d5ae    8203    2015-11-17
71568   0003c6aec5835964870902d697c06d21abf76bf7    139439  2013-04-19
2959945 000419c19d77df7285e669614075b47414e3ab2c    398 2017-03-20
3483049 00061dc0bc2452304107ddc75e7ee2908c729905    28618   2017-08-17
3483047 00061dc0bc2452304107ddc75e7ee2908c729905    28618   2017-08-17

我想得到这个结果集:

2480229 00002e87605311feb82b70473b61e81f0223c774    18178   2016-10-05
2997411 0000bfd20ef84948eee6811ce5bbac03de42ccb0    1293    2017-03-31
1304839 000280169fc78d704a2d4569bfb6f42ea4a1d5ae    8203    2015-11-10

71568   0003c6aec5835964870902d697c06d21abf76bf7    139439  2013-04-19
2959945 000419c19d77df7285e669614075b47414e3ab2c    398 2017-03-20
3483049 00061dc0bc2452304107ddc75e7ee2908c729905    28618   2017-08-17

我希望从结果集中删除以下重复的行:

I want the following duplicated lines to be removed from the resultset:

1364656 000280169fc78d704a2d4569bfb6f42ea4a1d5ae    8203    2015-11-17
3483047 00061dc0bc2452304107ddc75e7ee2908c729905    28618   2017-08-17

推荐答案

使用DISTINCT ON:

SELECT DISTINCT ON (contenthash)
  id,
  contenthash,
  filesize,
  to_timestamp(timecreated) :: DATE
FROM mdl_files
ORDER BY contenthash, timecreated, id;

DISTINCT ON 是 Postgres 扩展,它确保为括号中键的每个唯一组合返回一行.具体的行是根据order by子句找到的第一行.

DISTINCT ON is a Postgres extension that makes sure that returns one row for each unique combination of the keys in parentheses. The specific row is the first one found based on the order by clause.

这篇关于在使用 PostgreSQL 的选择查询中删除基于字段的重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆