在SQL中比较列时如何打破平局 [英] How to break ties when comparing columns in SQL

查看:89
本文介绍了在SQL中比较列时如何打破平局的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试删除Postgres中的重复项.我将其用作查询的基础:

I am trying to delete duplicates in Postgres. I am using this as the base of my query:

DELETE FROM case_file as p
WHERE EXISTS (
    SELECT FROM case_file as p1
    WHERE p1.serial_no = p.serial_no
    AND p1.cfh_status_dt < p.cfh_status_dt
    );

它工作得很好,只是当日期cfh_status_dt相等时,两个记录都不会被删除.

It works well, except that when the dates cfh_status_dt are equal then neither of the records are removed.

对于具有相同serial_no和日期相同的行,我想保留一个具有registration_no的行(如果有的话,此列也具有NULLS).

For rows that have the same serial_no and the date is the same, I would like to keep the one that has a registration_no (if any do, this column also has NULLS).

有没有一种方法可以对所有一个查询执行此操作,也许可以使用case语句或另一个简单的比较?

Is there a way I can do this with all one query, possibly with a case statement or another simple comparison?

推荐答案

DELETE FROM case_file AS p
WHERE  id NOT IN (
   SELECT DISTINCT ON (serial_no) id  -- id = PK
   FROM   case_file 
   ORDER  BY serial_no, cfh_status_dt DESC, registration_no
   );

这将保留每个serial_no的(第一)最新行,如果有多个候选者,则选择最小的registration_no.

This keeps the (one) latest row per serial_no, choosing the smallest registration_no if there are multiple candidates.

NULL以默认的升序排序.因此,任何不为registration_no的行都是首选.

NULL sorts last in default ascending order. So any row with a not-null registration_no is preferred.

如果要改为 greatst registration_no,但仍要对NULL值 last 进行排序,请使用:

If you want the greatest registration_no instead, to still sort NULL values last, use:

   ...
   ORDER  BY serial_no, cfh_status_dt DESC, registration_no DESC NULLS LAST

请参阅:

  • Select first row in each GROUP BY group?
  • Sort by column ASC, but NULL values first?

如果没有可用于此目的的PK(PRIMARY KEY)或其他UNIQUE NOT NULL(组合)列,则可以使用ctid.参见:

If you have no PK (PRIMARY KEY) or other UNIQUE NOT NULL (combination of) column(s) you can use for this purpose, you can fall back to ctid. See:

NOT IN通常不是最有效的方法.但这处理涉及NULL值的重复项.参见:

NOT IN is typically not the most efficient way. But this deals with duplicates involving NULL values. See:

如果有很多重复-您可以负担得起! -创建新的原始幸存者表并替换旧表,而不是删除现有表中的大多数行,可能会(非常)高效.

If there are many duplicates - and you can afford to do so! - it can be (much) more efficient to create a new, pristine table of survivors and replace the old table, instead of deleting the majority of rows in the existing table.

或者创建一个临时的幸存者表,截断旧的幸存者并从temp表中插入.这样,依赖对象(例如视图或FK约束)可以保留在原位.参见:

Or create a temporary table of survivors, truncate the old and insert from the temp table. This way depending objects like views or FK constraints can stay in place. See:

幸存的行很简单:

SELECT DISTINCT ON (serial_no) *
FROM   case_file 
ORDER  BY serial_no, cfh_status_dt DESC, registration_no;

这篇关于在SQL中比较列时如何打破平局的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆