更新 1.2 亿条记录的最快方法 [英] Fastest way to update 120 Million records
问题描述
我需要在 1.2 亿条记录表中初始化一个值为 -1 的新字段.
I need to initialize a new field with the value -1 in a 120 Million record table.
Update table
set int_field = -1;
在取消之前我让它运行了 5 个小时.
I let it run for 5 hours before canceling it.
我尝试在事务级别设置为读取未提交的情况下运行它,但结果相同.
I tried running it with transaction level set to read uncommitted with the same results.
Recovery Model = Simple.
MS SQL Server 2005
有什么建议可以更快地完成这项工作吗?
Any advice on getting this done faster?
推荐答案
更新 120M 记录表的唯一合理方法是使用 SELECT
语句填充秒 桌子.执行此操作时必须小心.下面的说明.
The only sane way to update a table of 120M records is with a SELECT
statement that populates a second table. You have to take care when doing this. Instructions below.
简单案例
对于没有聚集索引的表,在没有并发 DML 的时间内:
For a table w/out a clustered index, during a time w/out concurrent DML:
SELECT *, new_col = 1 INTO clone.BaseTable FROM dbo.BaseTable
- 在新表上重新创建索引、约束等
- 使用 ALTER SCHEMA ... TRANSFER 切换旧的和新的.
- 删除旧表
如果您无法创建克隆架构,则可以使用同一架构中的不同表名.请记住在切换后重命名所有约束和触发器(如果适用).
If you can't create a clone schema, a different table name in the same schema will do. Remember to rename all your constraints and triggers (if applicable) after the switch.
非简单案例
首先,在不同的架构下使用相同的名称重新创建您的BaseTable
,例如clone.BaseTable
.使用单独的架构将简化稍后的重命名过程.
First, recreate your BaseTable
with the same name under a different schema, eg clone.BaseTable
. Using a separate schema will simplify the rename process later.
- 包括聚集索引(如果适用).请记住,主键和唯一约束可能会聚集在一起,但不一定如此.
- 包括标识列和计算列(如果适用).
- 包括您的新 INT 列,无论它属于哪里.
- 请勿包含以下任何内容:
- 触发器
- 外键约束
- 非聚集索引/主键/唯一约束
- 检查约束或默认约束.默认值没有太大区别,但我们正在努力保持事情最少.
- Include the clustered index, if applicable. Remember that primary keys and unique constraints may be clustered, but not necessarily so.
- Include identity columns and computed columns, if applicable.
- Include your new INT column, wherever it belongs.
- Do not include any of the following:
- triggers
- foreign key constraints
- non-clustered indexes/primary keys/unique constraints
- check constraints or default constraints. Defaults don't make much of difference, but we're trying to keep things minimal.
然后,用 1000 行测试您的插入:
Then, test your insert w/ 1000 rows:
-- assuming an IDENTITY column in BaseTable SET IDENTITY_INSERT clone.BaseTable ON GO INSERT clone.BaseTable WITH (TABLOCK) (Col1, Col2, Col3) SELECT TOP 1000 Col1, Col2, Col3 = -1 FROM dbo.BaseTable GO SET IDENTITY_INSERT clone.BaseTable OFF
检查结果.如果一切都按顺序出现:
Examine the results. If everything appears in order:
- 截断克隆表
- 确保数据库处于大容量日志或简单恢复模式
- 执行完整插入.
这将需要一段时间,但不会像更新一样长.完成后,检查克隆表中的数据以确保一切正确.
This will take a while, but not nearly as long as an update. Once it completes, check the data in the clone table to make sure it everything is correct.
然后,重新创建所有非聚集主键/唯一约束/索引和外键约束(按此顺序).如果适用,重新创建默认值并检查约束.重新创建所有触发器.在单独的批处理中重新创建每个约束、索引或触发器.例如:
Then, recreate all non-clustered primary keys/unique constraints/indexes and foreign key constraints (in that order). Recreate default and check constraints, if applicable. Recreate all triggers. Recreate each constraint, index or trigger in a separate batch. eg:
ALTER TABLE clone.BaseTable ADD CONSTRAINT UQ_BaseTable UNIQUE (Col2) GO -- next constraint/index/trigger definition here
最后,将
dbo.BaseTable
移动到备份架构,将clone.BaseTable
移动到 dbo 架构(或您的表应该存在的任何地方).Finally, move
dbo.BaseTable
to a backup schema andclone.BaseTable
to the dbo schema (or wherever your table is supposed to live).-- -- perform first true-up operation here, if necessary -- EXEC clone.BaseTable_TrueUp -- GO -- -- create a backup schema, if necessary -- CREATE SCHEMA backup_20100914 -- GO BEGIN TRY BEGIN TRANSACTION ALTER SCHEMA backup_20100914 TRANSFER dbo.BaseTable -- -- perform second true-up operation here, if necessary -- EXEC clone.BaseTable_TrueUp ALTER SCHEMA dbo TRANSFER clone.BaseTable COMMIT TRANSACTION END TRY BEGIN CATCH SELECT ERROR_MESSAGE() -- add more info here if necessary ROLLBACK TRANSACTION END CATCH GO
如果您需要释放磁盘空间,此时您可以删除原始表,但最好将其保留一段时间.
If you need to free-up disk space, you may drop your original table at this time, though it may be prudent to keep it around a while longer.
不用说,这是理想的离线操作.如果有人在执行此操作时修改数据,则必须使用架构开关执行校准操作.我建议在
dbo.BaseTable
上创建一个触发器来将所有 DML 记录到一个单独的表中.在开始插入之前启用此触发器.然后在您执行模式传输的同一事务中,使用日志表执行校准.首先在数据的子集上测试!Delta 很容易搞砸.Needless to say, this is ideally an offline operation. If you have people modifying data while you perform this operation, you will have to perform a true-up operation with the schema switch. I recommend creating a trigger on
dbo.BaseTable
to log all DML to a separate table. Enable this trigger before you start the insert. Then in the same transaction that you perform the schema transfer, use the log table to perform a true-up. Test this first on a subset of the data! Deltas are easy to screw up.这篇关于更新 1.2 亿条记录的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!