SQL Server中很大表的更新或合并 [英] UPDATE or MERGE of very big tables in SQL Server
问题描述
我需要每天对非常大的记录(3亿条记录)和广泛的TABLE1
进行更新.更新的源数据位于另一个表UTABLE
中,该表占TABLE1
行的10%-25%,但范围很窄.两个表都以record_id
作为主键.
I need to perform a daily update of a very large (300M records) and broad TABLE1
. The the source data for the updates is located in another table UTABLE
that is 10%-25% the rows of TABLE1
but is narrow. Both tables have record_id
as a primary key.
目前,我正在使用以下方法重新创建TABLE1
:
Presently, I am recreating TABLE1
using the following approach:
<!-- language: sql -->
1) SELECT (required columns) INTO TMP_TABLE1
FROM TABLE1 T join UTABLE U on T.record_id=U.record_id
2) DROP TABLE TABLE1
3) sp_rename 'TMP_TABLE1', 'TABLE1'
但是,这在我的服务器上需要近40分钟的时间(SQL Server为60GB的RAM).我想获得50%的性能提升-我还可以尝试其他哪些选择?
However this takes nearly 40 minutes on my server (60GB of RAM for SQL Server). I want to achieve a 50% performance gain - what other options can I try?
-
MERGE
和UPDATE
-类似于下面的代码,仅对于很小的UTABLE
表工作更快-全尺寸时,所有内容都将挂起:
MERGE
andUPDATE
- something like the code below works faster only for a very smallUTABLE
table - at full size, everything just hangs:
<!-- language: SQL -->
MERGE TABLE1 as target
USING UTABLE as source
ON target.record_id = source.record_id
WHEN MATCHED THEN
UPDATE SET Target.columns=source.columns
我听说我可以使用ROWCOUNT来执行批处理MERGE-但我认为对于300M行表来说,它的速度还不够快.
I heard that I can perform a batch MERGE by using ROWCOUNT - but I don't think it can be fast enough for a 300M row table.
是否有任何对SQL查询有用的提示?
Any SQL query hints that can be helpful?
推荐答案
实际上,我已经找到了有关此类查询的一般建议:使用SQL Merge或Update的想法非常聪明,但是在需要更新时会失败.大而宽的表(即 240M )中的许多记录(即 75M ).
Actually i've found out general recommendations for such a queries: Idea to use SQL Merge or Update is a very clever one but it fails when we need to update many records (i.e. 75M) in a big and wide table (i.e. 240M).
看下面的查询的查询计划,我们可以说TABLE1的TABLE SCAN
和最终的MERGE
花费了90%的时间.
Looking at the query plan of the query below we can say that TABLE SCAN
of TABLE1 and final MERGE
are taking 90% of time.
MERGE TABLE1 as Target
USING UTABLE as source
ON Target.record_id = source.record_id
WHEN MATCHED AND (condition) THEN
UPDATE SET Target.columns=source.columns
因此,要使用MERGE,我们需要:
So in order to use MERGE we need to:
- 减少我们需要更新的行数,并将此信息正确传递给SQL Server.这可以通过减小
UTABLE
或指定其他condition
来缩小要合并的部分来实现. - 确保要合并的部分适合内存,否则查询运行速度会变慢.将
TABLE1
减少两倍,可以将我的实际查询时间从11小时减少到40分钟.
- Reduce the number of rows we need to update and correctly pass this information to SQL Server. This can be done by making
UTABLE
smaller or specifying additionalcondition
that narrows part to-be-merged. - Make sure that part to-be-merged fits in memory otherwise query runs way slower. Making
TABLE1
twice less reduced my real query time from 11 hours to 40 minutes.
正如Mark所提到的,您可以使用UPDATE
语法并使用WHERE
子句来缩小要合并的部分-这将产生相同的结果.另外,请避免索引TABLE1
,因为这将导致在MERGE
As Mark mentioned you can use UPDATE
syntax and use WHERE
clause to narrow part to-be-merged - this will give same results. Also please avoid indexing TABLE1
as this will cause additional work to rebuild index during MERGE
这篇关于SQL Server中很大表的更新或合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!