更新 1.2 亿条记录的最快方法 [英] Fastest way to update 120 Million records

查看:29
本文介绍了更新 1.2 亿条记录的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 1.2 亿条记录表中初始化一个值为 -1 的新字段.

I need to initialize a new field with the value -1 in a 120 Million record table.

Update table
       set int_field = -1;

在取消之前我让它运行了 5 个小时.

I let it run for 5 hours before canceling it.

我尝试在事务级别设置为读取未提交的情况下运行它,但结果相同.

I tried running it with transaction level set to read uncommitted with the same results.

Recovery Model = Simple.
MS SQL Server 2005

有什么建议可以更快地完成这项工作吗?

Any advice on getting this done faster?

推荐答案

更新 120M 记录表的唯一合理方法是使用 SELECT 语句填充 桌子.执行此操作时必须小心.下面的说明.

The only sane way to update a table of 120M records is with a SELECT statement that populates a second table. You have to take care when doing this. Instructions below.

简单案例

对于没有聚集索引的表,在没有并发 DML 的时间内:

For a table w/out a clustered index, during a time w/out concurrent DML:

  • SELECT *, new_col = 1 INTO clone.BaseTable FROM dbo.BaseTable
  • 在新表上重新创建索引、约束等
  • 使用 ALTER SCHEMA ... TRANSFER 切换旧的和新的.
  • 删除旧表

如果您无法创建克隆架构,则可以使用同一架构中的不同表名.请记住在切换后重命名所有约束和触发器(如果适用).

If you can't create a clone schema, a different table name in the same schema will do. Remember to rename all your constraints and triggers (if applicable) after the switch.

非简单案例

首先,在不同的架构下使用相同的名称重新创建您的BaseTable,例如clone.BaseTable.使用单独的架构将简化稍后的重命名过程.

First, recreate your BaseTable with the same name under a different schema, eg clone.BaseTable. Using a separate schema will simplify the rename process later.

  • 包括聚集索引(如果适用).请记住,主键和唯一约束可能会聚集在一起,但不一定如此.
  • 包括标识列和计算列(如果适用).
  • 包括您的新 INT 列,无论它属于哪里.
  • 请勿包含以下任何内容:
    • 触发器
    • 外键约束
    • 非聚集索引/主键/唯一约束
    • 检查约束或默认约束.默认值没有太大区别,但我们正在努力保持事情最少.
    • Include the clustered index, if applicable. Remember that primary keys and unique constraints may be clustered, but not necessarily so.
    • Include identity columns and computed columns, if applicable.
    • Include your new INT column, wherever it belongs.
    • Do not include any of the following:
      • triggers
      • foreign key constraints
      • non-clustered indexes/primary keys/unique constraints
      • check constraints or default constraints. Defaults don't make much of difference, but we're trying to keep things minimal.

      然后,用 1000 行测试您的插入:

      Then, test your insert w/ 1000 rows:

      -- assuming an IDENTITY column in BaseTable
      SET IDENTITY_INSERT clone.BaseTable ON
      GO
      INSERT clone.BaseTable WITH (TABLOCK) (Col1, Col2, Col3)
      SELECT TOP 1000 Col1, Col2, Col3 = -1
      FROM dbo.BaseTable
      GO
      SET IDENTITY_INSERT clone.BaseTable OFF
      

      检查结果.如果一切都按顺序出现:

      Examine the results. If everything appears in order:

      • 截断克隆表
      • 确保数据库处于大容量日志或简单恢复模式
      • 执行完整插入.

      这将需要一段时间,但不会像更新一样长.完成后,检查克隆表中的数据以确保一切正确.

      This will take a while, but not nearly as long as an update. Once it completes, check the data in the clone table to make sure it everything is correct.

      然后,重新创建所有非聚集主键/唯一约束/索引和外键约束(按此顺序).如果适用,重新创建默认值并检查约束.重新创建所有触发器.在单独的批处理中重新创建每个约束、索引或触发器.例如:

      Then, recreate all non-clustered primary keys/unique constraints/indexes and foreign key constraints (in that order). Recreate default and check constraints, if applicable. Recreate all triggers. Recreate each constraint, index or trigger in a separate batch. eg:

      ALTER TABLE clone.BaseTable ADD CONSTRAINT UQ_BaseTable UNIQUE (Col2)
      GO
      -- next constraint/index/trigger definition here
      

      最后,将 dbo.BaseTable 移动到备份架构,将 clone.BaseTable 移动到 dbo 架构(或您的表应该存在的任何地方).

      Finally, move dbo.BaseTable to a backup schema and clone.BaseTable to the dbo schema (or wherever your table is supposed to live).

      -- -- perform first true-up operation here, if necessary
      -- EXEC clone.BaseTable_TrueUp
      -- GO
      -- -- create a backup schema, if necessary
      -- CREATE SCHEMA backup_20100914
      -- GO
      BEGIN TRY
        BEGIN TRANSACTION
        ALTER SCHEMA backup_20100914 TRANSFER dbo.BaseTable
        -- -- perform second true-up operation here, if necessary
        -- EXEC clone.BaseTable_TrueUp
        ALTER SCHEMA dbo TRANSFER clone.BaseTable
        COMMIT TRANSACTION
      END TRY
      BEGIN CATCH
        SELECT ERROR_MESSAGE() -- add more info here if necessary
        ROLLBACK TRANSACTION
      END CATCH
      GO
      

      如果您需要释放磁盘空间,此时您可以删除原始表,但最好将其保留一段时间.

      If you need to free-up disk space, you may drop your original table at this time, though it may be prudent to keep it around a while longer.

      不用说,这是理想的离线操作.如果有人在执行此操作时修改数据,则必须使用架构开关执行校准操作.我建议在 dbo.BaseTable 上创建一个触发器来将所有 DML 记录到一个单独的表中.在开始插入之前启用此触发器.然后在您执行模式传输的同一事务中,使用日志表执行校准.首先在数据的子集上测试!Delta 很容易搞砸.

      Needless to say, this is ideally an offline operation. If you have people modifying data while you perform this operation, you will have to perform a true-up operation with the schema switch. I recommend creating a trigger on dbo.BaseTable to log all DML to a separate table. Enable this trigger before you start the insert. Then in the same transaction that you perform the schema transfer, use the log table to perform a true-up. Test this first on a subset of the data! Deltas are easy to screw up.

      这篇关于更新 1.2 亿条记录的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆