如何在 SQL Server 中更新具有数百万行的大表? [英] How to update large table with millions of rows in SQL Server?

查看:73
本文介绍了如何在 SQL Server 中更新具有数百万行的大表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 UPDATE 语句,它可以更新超过一百万条记录.我想以 1000 或 10000 为一组更新它们.我尝试使用 @@ROWCOUNT 但我无法获得想要的结果.

I've an UPDATE statement which can update more than million records. I want to update them in batches of 1000 or 10000. I tried with @@ROWCOUNT but I am unable to get desired result.

我所做的只是为了测试目的,我选择了包含 14 条记录的表并将行数设置为 5.此查询应该更新 5、5 和 4 中的记录,但它只更新前 5 条记录.

Just for testing purpose what I did is, I selected table with 14 records and set a row count of 5. This query is supposed to update records in 5, 5 and 4 but it just updates first 5 records.

查询 - 1:

SET ROWCOUNT 5

UPDATE TableName 
SET Value = 'abc1' 
WHERE Parameter1 = 'abc' AND Parameter2 = 123

WHILE @@ROWCOUNT > 0
BEGIN
    SET rowcount 5

    UPDATE TableName 
    SET Value = 'abc1' 
    WHERE Parameter1 = 'abc' AND Parameter2 = 123

    PRINT (@@ROWCOUNT)
END

SET rowcount 0

查询 - 2:

SET ROWCOUNT  5

WHILE (@@ROWCOUNT > 0)
BEGIN
    BEGIN TRANSACTION

    UPDATE TableName 
    SET Value = 'abc1' 
    WHERE Parameter1 = 'abc' AND Parameter2 = 123

    PRINT (@@ROWCOUNT)

    IF @@ROWCOUNT = 0
    BEGIN
        COMMIT TRANSACTION

        BREAK
    END

    COMMIT TRANSACTION
END

SET ROWCOUNT  0

我在这里遗漏了什么?

推荐答案

  1. 除非您确定该操作正在获取页面锁定(因为 UPDATE 操作的一部分是每页多行),否则您不应更新一组中的 10k 行.问题是锁升级(从行或页到表锁)发生在 5000 个.因此,将其保持在 5000 以下是最安全的,以防操作使用行锁.

  1. You should not be updating 10k rows in a set unless you are certain that the operation is getting Page Locks (due to multiple rows per page being part of the UPDATE operation). The issue is that Lock Escalation (from either Row or Page to Table locks) occurs at 5000 locks. So it is safest to keep it just below 5000, just in case the operation is using Row Locks.

您应该使用SETROWCOUNT 来限制将被修改的行数.这里有两个问题:

You should not be using SET ROWCOUNT to limit the number of rows that will be modified. There are two issues here:

  1. 自 SQL Server 2005 发布(11 年前)以来,它已被弃用:

  1. It has that been deprecated since SQL Server 2005 was released (11 years ago):

使用 SET ROWCOUNT 不会影响 SQL Server 未来版本中的 DELETE、INSERT 和 UPDATE 语句.在新的开发工作中避免将 SET ROWCOUNT 与 DELETE、INSERT 和 UPDATE 语句一起使用,并计划修改当前使用它的应用程序.对于类似的行为,请使用 TOP 语法

Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax

  • 它不仅会影响您正在处理的语句:

  • It can affect more than just the statement you are dealing with:

    设置 SET ROWCOUNT 选项会导致大多数 Transact-SQL 语句在受到指定行数影响时停止处理.这包括触发器.ROWCOUNT 选项不影响动态游标,但它限制键集和不敏感游标的行集.应谨慎使用此选项.

    Setting the SET ROWCOUNT option causes most Transact-SQL statements to stop processing when they have been affected by the specified number of rows. This includes triggers. The ROWCOUNT option does not affect dynamic cursors, but it does limit the rowset of keyset and insensitive cursors. This option should be used with caution.

  • 改为使用 TOP () 子句.

    Instead, use the TOP () clause.

    在这里进行显式交易没有任何意义.它使代码复杂化,并且您无法处理 ROLLBACK,因为每个语句都是它自己的事务(即自动提交),因此甚至不需要它.

    There is no purpose in having an explicit transaction here. It complicates the code and you have no handling for a ROLLBACK, which isn't even needed since each statement is its own transaction (i.e. auto-commit).

    假设您找到了保留显式事务的理由,那么您就没有 TRY/CATCH 结构.请参阅我在 DBA.StackExchange 上的回答以获取处理事务的 TRY/CATCH 模板:

    Assuming you find a reason to keep the explicit transaction, then you do not have a TRY / CATCH structure. Please see my answer on DBA.StackExchange for a TRY / CATCH template that handles transactions:

    我们是否需要在 C# 代码和存储过程中处理事务

    我怀疑问题的示例代码中没有显示真正的 WHERE 子句,因此只需依赖已显示的内容,更好的模型是:

    I suspect that the real WHERE clause is not being shown in the example code in the Question, so simply relying upon what has been shown, a better model would be:

    DECLARE @Rows INT,
            @BatchSize INT; -- keep below 5000 to be safe
    
    SET @BatchSize = 2000;
    
    SET @Rows = @BatchSize; -- initialize just to enter the loop
    
    BEGIN TRY    
      WHILE (@Rows = @BatchSize)
      BEGIN
          UPDATE TOP (@BatchSize) tab
          SET    tab.Value = 'abc1'
          FROM  TableName tab
          WHERE tab.Parameter1 = 'abc'
          AND   tab.Parameter2 = 123
          AND   tab.Value <> 'abc1' COLLATE Latin1_General_100_BIN2;
          -- Use a binary Collation (ending in _BIN2, not _BIN) to make sure
          -- that you don't skip differences that compare the same due to
          -- insensitivity of case, accent, etc, or linguistic equivalence.
    
          SET @Rows = @@ROWCOUNT;
      END;
    END TRY
    BEGIN CATCH
      RAISERROR(stuff);
      RETURN;
    END CATCH;
    

    通过针对 @BatchSize 测试 @Rows,您可以避免最终的 UPDATE 查询(在大多数情况下),因为最终集合的行数通常小于 @BatchSize,在这种情况下,我们知道没有更多要处理的(这是您在 答案).只有在最终行集等于 @BatchSize 的情况下,此代码才会运行影响 0 行的最终 UPDATE.

    By testing @Rows against @BatchSize, you can avoid that final UPDATE query (in most cases) because the final set is typically some number of rows less than @BatchSize, in which case we know that there are no more to process (which is what you see in the output shown in your answer). Only in those cases where the final set of rows is equal to @BatchSize will this code run a final UPDATE affecting 0 rows.

    我还在 WHERE 子句中添加了一个条件,以防止已更新的行再次更新.

    I also added a condition to the WHERE clause to prevent rows that have already been updated from being updated again.

    这篇关于如何在 SQL Server 中更新具有数百万行的大表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆