在SQL中比较列时如何打破平局 [英] How to break ties when comparing columns in SQL

查看：89 发布时间：2020/8/1 20:01:53 sql postgresql duplicates greatest-n-per-group sql-delete

本文介绍了在SQL中比较列时如何打破平局的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试删除Postgres中的重复项.我将其用作查询的基础:

I am trying to delete duplicates in Postgres. I am using this as the base of my query:

DELETE FROM case_file as p
WHERE EXISTS (
    SELECT FROM case_file as p1
    WHERE p1.serial_no = p.serial_no
    AND p1.cfh_status_dt < p.cfh_status_dt
    );

它工作得很好，只是当日期cfh_status_dt相等时，两个记录都不会被删除.

It works well, except that when the dates cfh_status_dt are equal then neither of the records are removed.

对于具有相同serial_no和日期相同的行，我想保留一个具有registration_no的行(如果有的话，此列也具有NULLS).

For rows that have the same serial_no and the date is the same, I would like to keep the one that has a registration_no (if any do, this column also has NULLS).

有没有一种方法可以对所有一个查询执行此操作，也许可以使用case语句或另一个简单的比较?

Is there a way I can do this with all one query, possibly with a case statement or another simple comparison?

推荐答案

DELETE FROM case_file AS p
WHERE  id NOT IN (
   SELECT DISTINCT ON (serial_no) id  -- id = PK
   FROM   case_file 
   ORDER  BY serial_no, cfh_status_dt DESC, registration_no
   );

这将保留每个serial_no的(第一)最新行，如果有多个候选者，则选择最小的registration_no.

This keeps the (one) latest row per serial_no, choosing the smallest registration_no if there are multiple candidates.

NULL以默认的升序排序.因此，任何不为registration_no的行都是首选.

NULL sorts last in default ascending order. So any row with a not-null registration_no is preferred.

如果要改为 greatst registration_no，但仍要对NULL值 last 进行排序，请使用:

If you want the greatest registration_no instead, to still sort NULL values last, use:

   ...
   ORDER  BY serial_no, cfh_status_dt DESC, registration_no DESC NULLS LAST

请参阅:

Select first row in each GROUP BY group?
Sort by column ASC, but NULL values first?

如果没有可用于此目的的PK(PRIMARY KEY)或其他UNIQUE NOT NULL(组合)列，则可以使用ctid.参见:

If you have no PK (PRIMARY KEY) or other UNIQUE NOT NULL (combination of) column(s) you can use for this purpose, you can fall back to ctid. See:

我如何(或者我可以)在多个列上选择DISTINCT吗?

NOT IN通常不是最有效的方法.但这处理涉及NULL值的重复项.参见:

NOT IN is typically not the most efficient way. But this deals with duplicates involving NULL values. See:

如何删除没有唯一标识符的重复行

如果有很多重复-您可以负担得起！ -创建新的原始幸存者表并替换旧表，而不是删除现有表中的大多数行，可能会(非常)高效.

If there are many duplicates - and you can afford to do so! - it can be (much) more efficient to create a new, pristine table of survivors and replace the old table, instead of deleting the majority of rows in the existing table.

或者创建一个临时的幸存者表，截断旧的幸存者并从temp表中插入.这样，依赖对象(例如视图或FK约束)可以保留在原位.参见:

Or create a temporary table of survivors, truncate the old and insert from the temp table. This way depending objects like views or FK constraints can stay in place. See:

如何删除重复的条目?

幸存的行很简单:

SELECT DISTINCT ON (serial_no) *
FROM   case_file 
ORDER  BY serial_no, cfh_status_dt DESC, registration_no;

这篇关于在SQL中比较列时如何打破平局的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在SQL中比较列时如何打破平局 [英] How to break ties when comparing columns in SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在SQL中比较列时如何打破平局 [英] How to break ties when comparing columns in SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭