在SQL中比较列时如何打破平局 [英] How to break ties when comparing columns in SQL
问题描述
我正在尝试删除Postgres中的重复项.我将其用作查询的基础:
I am trying to delete duplicates in Postgres. I am using this as the base of my query:
DELETE FROM case_file as p
WHERE EXISTS (
SELECT FROM case_file as p1
WHERE p1.serial_no = p.serial_no
AND p1.cfh_status_dt < p.cfh_status_dt
);
它工作得很好,只是当日期cfh_status_dt
相等时,两个记录都不会被删除.
It works well, except that when the dates cfh_status_dt
are equal then neither of the records are removed.
对于具有相同serial_no和日期相同的行,我想保留一个具有registration_no的行(如果有的话,此列也具有NULLS).
For rows that have the same serial_no and the date is the same, I would like to keep the one that has a registration_no (if any do, this column also has NULLS).
有没有一种方法可以对所有一个查询执行此操作,也许可以使用case语句或另一个简单的比较?
Is there a way I can do this with all one query, possibly with a case statement or another simple comparison?
推荐答案
DELETE FROM case_file AS p
WHERE id NOT IN (
SELECT DISTINCT ON (serial_no) id -- id = PK
FROM case_file
ORDER BY serial_no, cfh_status_dt DESC, registration_no
);
这将保留每个serial_no
的(第一)最新行,如果有多个候选者,则选择最小的registration_no
.
This keeps the (one) latest row per serial_no
, choosing the smallest registration_no
if there are multiple candidates.
NULL
以默认的升序排序.因此,任何不为registration_no
的行都是首选.
NULL
sorts last in default ascending order. So any row with a not-null registration_no
is preferred.
如果要改为 greatst registration_no
,但仍要对NULL值 last 进行排序,请使用:
If you want the greatest registration_no
instead, to still sort NULL values last, use:
...
ORDER BY serial_no, cfh_status_dt DESC, registration_no DESC NULLS LAST
请参阅:
- Select first row in each GROUP BY group?
- Sort by column ASC, but NULL values first?
如果没有可用于此目的的PK(PRIMARY KEY
)或其他UNIQUE NOT NULL
(组合)列,则可以使用ctid
.参见:
If you have no PK (PRIMARY KEY
) or other UNIQUE NOT NULL
(combination of) column(s) you can use for this purpose, you can fall back to ctid
. See:
NOT IN
通常不是最有效的方法.但这处理涉及NULL值的重复项.参见:
NOT IN
is typically not the most efficient way. But this deals with duplicates involving NULL values. See:
如果有很多重复-您可以负担得起! -创建新的原始幸存者表并替换旧表,而不是删除现有表中的大多数行,可能会(非常)高效.
If there are many duplicates - and you can afford to do so! - it can be (much) more efficient to create a new, pristine table of survivors and replace the old table, instead of deleting the majority of rows in the existing table.
或者创建一个临时的幸存者表,截断旧的幸存者并从temp表中插入.这样,依赖对象(例如视图或FK约束)可以保留在原位.参见:
Or create a temporary table of survivors, truncate the old and insert from the temp table. This way depending objects like views or FK constraints can stay in place. See:
幸存的行很简单:
SELECT DISTINCT ON (serial_no) *
FROM case_file
ORDER BY serial_no, cfh_status_dt DESC, registration_no;
这篇关于在SQL中比较列时如何打破平局的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!