更新 SQL Server 表中除一条重复记录之外的所有记录 [英] Update all but one of duplicate records in table in SQL Server

查看:28
本文介绍了更新 SQL Server 表中除一条重复记录之外的所有记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 SQL Server 表,其中一列 (object_id) 中有重复的条目,例如:

I have a SQL Server table that has duplicate entries in one of the columns (object_id) e.g.:

+----+-----------+------------+
| id | object_id | status_val |
+----+-----------+------------+
|  1 |         1 |          0 | 
|  2 |         1 |          0 | 
|  3 |         1 |          0 | 
|  4 |         2 |          0 | 
|  5 |         3 |          0 | 
|  6 |         3 |          0 | 
+----+-----------+------------+

object_id 列中有重复时,我需要更新它们的所有状态,除了一个.所以在上表 object_id 1 和 3 是重复的.所以我想将他们的 status_val 更改为 2,但其中一个条目除外.结果如下:

I need to update all of their statuses except for one when there is duplication in the object_id column. So in the table above object_id 1 and 3 are duplicated. So I would want to change their status_val to 2, except one of the entries. The result would look like:

| id | object_id | status_val |
+----+-----------+------------+
|  1 |         1 |          0 | 
|  2 |         1 |          2 | 
|  3 |         1 |          2 | 
|  4 |         2 |          0 | 
|  5 |         3 |          0 | 
|  6 |         3 |          2 | 
+----+-----------+------------+

复制哪一行的状态更新并不重要.

It doesn't matter which one of the duplicated rows has it's status updated.

任何帮助将不胜感激.

推荐答案

你可以在没有 join 的情况下解决这个问题,这意味着它应该有更好的性能.这个想法是按您的 object_id 对数据进行分组,计算每个 object_id 的行号.这就是分区"的作用.然后您可以更新 row_num > 1 的位置.这将更新除第一个之外的所有重复 object_id!

You may solve this problem without a join, which means it should have better performance. The idea is to group the data by your object_id, counting the row number of each object_id. This is what "partition by" does. Then you can update where the row_num is > 1. This will update all duplicated object_id except the first one!

update t set t.status_val = 'some_status' 
from (
    select *, row_number() over(partition by object_id order by (select null)) row_num  
    from foo
) t 
where row_num > 1 

在 82944 条记录的测试表上,性能是这样的(您的里程可能会有所不同!):表'测试'.扫描计数 5,逻辑读取 82283,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0.CPU 时间 = 141 毫秒,经过时间 = 150 毫秒.

On a test table of 82944 records, the performance was such (your mileage may vary!): Table 'test'. Scan count 5, logical reads 82283, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. CPU time = 141 ms, elapsed time = 150 ms.

我们当然也可以通过使用内连接来解决这个问题,但是,通常这会导致更多的逻辑读取和更高的 CPU:

We can certainly also solve this problem by using an inner join, however, in general this should lead to more logical reads and higher CPU:

表'测试'.扫描计数10,逻辑读83622,物理读0,预读0,lob逻辑读0,lob物理读0,lob预读0.表工作文件".扫描计数 0,逻辑读 0,物理读 0,预读 0,lob 逻辑读 0,lob 物理读 0,lob 预读 0.表工作台".扫描计数4,逻辑读167426,物理读0,预读0,lob逻辑读0,lob物理读0,lob预读0.CPU 时间 = 342 毫秒,经过时间 = 233 毫秒.

Table 'test'. Scan count 10, logical reads 83622, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 4, logical reads 167426, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. CPU time = 342 ms, elapsed time = 233 ms.

要循环结果并小批量更新:

To loop over the results and update in smaller batches:

declare @rowcount int = 1;
declare @batch_size int = 1000;

while @rowcount > 0 
begin
    update top(@batch_size) t set t.status_val = 'already updated'
    from (
        select *, row_number() over(partition by object_id order by (select null)) row_num  
        from foo
        where status_val <> 'already updated' 
    ) t 
    where row_num > 1 
    set @rowcount = @@rowcount;
end

如果其他并发会话试图访问此表,这将有助于保持锁定.

This will help keep locking down if other concurrent sessions are trying to access this table.

这篇关于更新 SQL Server 表中除一条重复记录之外的所有记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆