更新 SQL Server 表中除一条重复记录之外的所有记录 [英] Update all but one of duplicate records in table in SQL Server

查看：28 发布时间：2021/8/25 19:39:29 sql-server

本文介绍了更新 SQL Server 表中除一条重复记录之外的所有记录的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 SQL Server 表，其中一列 (object_id) 中有重复的条目，例如:

I have a SQL Server table that has duplicate entries in one of the columns (object_id) e.g.:

+----+-----------+------------+
| id | object_id | status_val |
+----+-----------+------------+
|  1 |         1 |          0 | 
|  2 |         1 |          0 | 
|  3 |         1 |          0 | 
|  4 |         2 |          0 | 
|  5 |         3 |          0 | 
|  6 |         3 |          0 | 
+----+-----------+------------+

当 object_id 列中有重复时，我需要更新它们的所有状态，除了一个.所以在上表 object_id 1 和 3 是重复的.所以我想将他们的 status_val 更改为 2，但其中一个条目除外.结果如下:

I need to update all of their statuses except for one when there is duplication in the object_id column. So in the table above object_id 1 and 3 are duplicated. So I would want to change their status_val to 2, except one of the entries. The result would look like:

| id | object_id | status_val |
+----+-----------+------------+
|  1 |         1 |          0 | 
|  2 |         1 |          2 | 
|  3 |         1 |          2 | 
|  4 |         2 |          0 | 
|  5 |         3 |          0 | 
|  6 |         3 |          2 | 
+----+-----------+------------+

复制哪一行的状态更新并不重要.

It doesn't matter which one of the duplicated rows has it's status updated.

任何帮助将不胜感激.

推荐答案

你可以在没有 join 的情况下解决这个问题，这意味着它应该有更好的性能.这个想法是按您的 object_id 对数据进行分组，计算每个 object_id 的行号.这就是分区"的作用.然后您可以更新 row_num > 1 的位置.这将更新除第一个之外的所有重复 object_id！

You may solve this problem without a join, which means it should have better performance. The idea is to group the data by your object_id, counting the row number of each object_id. This is what "partition by" does. Then you can update where the row_num is > 1. This will update all duplicated object_id except the first one!

update t set t.status_val = 'some_status' 
from (
    select *, row_number() over(partition by object_id order by (select null)) row_num  
    from foo
) t 
where row_num > 1

在 82944 条记录的测试表上，性能是这样的(您的里程可能会有所不同！):表'测试'.扫描计数 5，逻辑读取 82283，物理读取 0，预读读取 0，lob 逻辑读取 0，lob 物理读取 0，lob 预读读取 0.CPU 时间 = 141 毫秒，经过时间 = 150 毫秒.

On a test table of 82944 records, the performance was such (your mileage may vary!): Table 'test'. Scan count 5, logical reads 82283, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. CPU time = 141 ms, elapsed time = 150 ms.

我们当然也可以通过使用内连接来解决这个问题，但是，通常这会导致更多的逻辑读取和更高的 CPU:

We can certainly also solve this problem by using an inner join, however, in general this should lead to more logical reads and higher CPU:

表'测试'.扫描计数10，逻辑读83622，物理读0，预读0，lob逻辑读0，lob物理读0，lob预读0.表工作文件".扫描计数 0，逻辑读 0，物理读 0，预读 0，lob 逻辑读 0，lob 物理读 0，lob 预读 0.表工作台".扫描计数4，逻辑读167426，物理读0，预读0，lob逻辑读0，lob物理读0，lob预读0.CPU 时间 = 342 毫秒，经过时间 = 233 毫秒.

Table 'test'. Scan count 10, logical reads 83622, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 4, logical reads 167426, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. CPU time = 342 ms, elapsed time = 233 ms.

要循环结果并小批量更新:

To loop over the results and update in smaller batches:

declare @rowcount int = 1;
declare @batch_size int = 1000;

while @rowcount > 0 
begin
    update top(@batch_size) t set t.status_val = 'already updated'
    from (
        select *, row_number() over(partition by object_id order by (select null)) row_num  
        from foo
        where status_val <> 'already updated' 
    ) t 
    where row_num > 1 
    set @rowcount = @@rowcount;
end

如果其他并发会话试图访问此表，这将有助于保持锁定.

This will help keep locking down if other concurrent sessions are trying to access this table.

这篇关于更新 SQL Server 表中除一条重复记录之外的所有记录的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

更新 SQL Server 表中除一条重复记录之外的所有记录 [英] Update all but one of duplicate records in table in SQL Server

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

更新 SQL Server 表中除一条重复记录之外的所有记录 [英] Update all but one of duplicate records in table in SQL Server

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭