如何使用SQL中的连接来优化更新 [英] How to optimize update using join in SQL

查看:94
本文介绍了如何使用SQL中的连接来优化更新的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过加入其他表来更新一个表列。任何表都没有索引。一个表(TRN_FINAL_FCM_DETAILS_MONTHLY)具有100000条记录,而其他表(TRN_PRE_FCM_DETAILS)具有400000条记录。考虑以下查询 -



任何表都没有密钥。

I am updating one table column by joining other table. There is no index on any table. One table (TRN_FINAL_FCM_DETAILS_MONTHLY ) has 100000 records and other (TRN_PRE_FCM_DETAILS ) has 400000 records. Consider following query-

There is no key on any table.

UPDATE A 
SET A.CHANGE_NOCHANGE = @CHANGED 
FROM TRN_PRE_FCM_DETAILS A, TRN_FINAL_FCM_DETAILS_MONTHLY B 
WHERE A.SITEID = B.SITEID 
  AND A.SHARINGCOUNT <> B.SHARINGCOUNT 
  AND A.REQUESTREF = B.REQUESTREF 
  AND A.BILLMONTH = @CURR_BILLMNTH 
  AND B.BILLMONTH = @PREV_BILLMNTH 
  AND A.BILLYEAR = @CURR_BILLYR 
  AND B.BILLYEAR = @PREV_BILLYR



但是这个查询需要2个小时才能执行。如何优化查询性能?我是性能调优的新手。任何帮助将不胜感激,因为我需要尽快解决这个问题。请帮助我们。



我尝试了什么:



我尝试在两个表上进行索引,但对索引知之甚少。我在连接条件中使用的列上尝试了NOnClustered索引。


But this query is taking 2 hours in execute. How can I optimize query performance? I am new in performance tuning. Any help would be appreciated as i need this solution as soon as possible. PLease help me guys.

What I have tried:

I tried indexing on both tables but doesn't know much about indexing. I tried NOnClustered index on columns used in join conditions.

推荐答案

使用MERGE语句

using-merge-in-sql-server-同时更新和删除 [ ^ ]

SQL服务器2008年引进到合并-statement-one-statement-for-insert-update-delete [ ^ ]
Use "MERGE" Statement
using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time[^]
sql-server-2008-introduction-to-merge-statement-one-statement-for-insert-update-delete[^]


使用MERGE声明

https://www.mssqltips.com/sqlservertip/1 704 / using-merge-in-sql-server-to-insert-update-and-delete-at-the-time-time /
Use "MERGE" Statement
https://www.mssqltips.com/sqlservertip/1704/using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time/


条件的顺序和类型确实很重要。



您使用的非ANSI连接非常低效,因为它尝试将表A中的所有内容连接到表B中的所有内容。虽然它只是在查看在第一个实例中,索引对于非常大的表和连接仍然非常大。将其更改为ANSI连接将有所帮助,但前提是您的加入条件会利用这​​一点。



使用第一个条件A.col<> B.col将尝试加入除表A中的1个项目以外的所有项目,除了表B中的1个项目。它根本不会改善这种情况。



尝试切换前两个条件,首先使用=。这将减少我的<>所做的比较次数非常好。



在where子句中你可以按原样完成三个where子句查看变量,因为将它们添加到连接中会做的很少,如果有的话,提高查询效率。



这将是结果:

The order, and type, of your conditions does matter.

The non-ANSI join you use is incredibly inefficient because it attempts to join everything from table A to everything from table B. Although it is only looking at indexes in the first instance, the indexes remain very large for very large tables and joins. Changing it to an ANSI join will help, but only if your Join conditions take advantage of this.

Using the first condition A.col <> B.col will attempt to join everything but 1 item from table A to everything but 1 from table B. It doesn't improve the situation very much at all.

Try switching the first two conditions to use the "=" first. That will reduce the number of comparisons made my the "<>" immensely.

The three where clauses you have that look at a variable could be done as they are, in a Where clause, because adding them to the join will do little, if anything, to improve the query efficiency.

This would be the result:
UPDATE A 
SET A.CHANGE_NOCHANGE = @CHANGED 
FROM TRN_PRE_FCM_DETAILS A
INNER JOIN TRN_FINAL_FCM_DETAILS_MONTHLY B ON A.SITEID = B.SITEID AND A.REQUESTREF = B.REQUESTREF AND A.SHARINGCOUNT <> B.SHARINGCOUNT 
WHERE A.BILLMONTH = @CURR_BILLMNTH 
  AND B.BILLMONTH = @PREV_BILLMNTH 
  AND A.BILLYEAR = @CURR_BILLYR 
  AND B.BILLYEAR = @PREV_BILLYR





其次,您可以通过在连接之前调用Where子句来进一步缩短查询时间。这再次减少了连接的索引大小。你可以用CTE做到这一点。根据您的数据设置方式和所涉及的数据类型,这可能会使事情变得更好或更糟:





Secondly, you could reduce the query time further by invoking the Where clause before the join. This again reduces the index size for the join. You can do that with a CTE. Depending on how your data is set up and the data types involved, this could make things much better or much worse:

With preCalcCTE as (
 Select * from TRN_FINAL_FCM_DETAILS_MONTHLY
 where  BILLMONTH = @PREV_BILLMNTH
 and B.BILLYEAR = @PREV_BILLYR)
UPDATE A 
SET A.CHANGE_NOCHANGE = @CHANGED 
FROM TRN_PRE_FCM_DETAILS A
INNER JOIN preCalcCTE B ON A.SITEID = B.SITEID AND A.REQUESTREF = B.REQUESTREF AND A.SHARINGCOUNT <> B.SHARINGCOUNT 
WHERE A.BILLMONTH = @CURR_BILLMNTH 
  AND A.BILLYEAR = @CURR_BILLYR





最后,int到int的比较比varchar更有效。 int总是设置几个字节,而varchar长度可以更大,甚至可以变化。始终在较大的数据类型之前计算更简单的数据类型比较。

索引可以提供帮助,但数据越大,插入和更新所需的时间就越长,因为索引也必须更新。不时检查您的索引是否碎片化是一个好主意。您可以重建索引以提高它们的效率。



检查索引碎片:



Finally, int to int comparison is much more efficient that varchar to varchar. An int is always a set few bytes whereas varchar length can be much larger and can even vary. Always calculate the simpler datatype comparisons before the larger ones.
Indexes can help, but the larger the data, the longer it takes for inserts and updates to occur because the index must also be updated. From time to time it's a good idea to check if your indexes become fragmented. You can rebuild the indexes to make them more efficient.

Check index fragmentation:

SELECT OBJECT_NAME(ind.OBJECT_ID) AS TableName, 
ind.name AS IndexName, indexstats.index_type_desc AS IndexType, 
indexstats.avg_fragmentation_in_percent 
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats 
INNER JOIN sys.indexes ind  
ON ind.object_id = indexstats.object_id 
AND ind.index_id = indexstats.index_id 
-- You only really need to reindex when over 30% or rebuild when over 50%
--WHERE indexstats.avg_fragmentation_in_percent > 30 
ORDER BY indexstats.avg_fragmentation_in_percent DESC





有几种方法可以重新索引或重建索引。DBCC DBREINDEX(Transact-SQL) [ ^ ]





注意:

大多数情况下(但并非总是如此)SQL实际上会代表您决定查询的哪些部分以什么顺序发生。第一次运行任何查询时,数据库将在计算运行它的最有效方法后实际编译并存储它。随着数据库的增长,sql编译的查询可能不再是最有效的执行。您可以清除查询缓存,以便SQL在下次运行时重新编译任何和所有查询。

清除SQL Server性能测试的缓存 [ ^ ]



重新设计的Conciser或者未来的项目:

我编写我的数据库所以常见的文本(例如ref代码)存储在一个表中,每个表都有一个ID,每个使用ref代码的表引用了ID。这使索引和比较MUCH更有效。



我曾经不得不通过这样做重新设计一个庞大的数据库(这几乎是颠倒的)。花了6个月的时间将其组合在一起并重新设计使用它的应用程序,但即使是我最好的估算,但数据传输将数据大小减少到5%,但也需要超过9k小时才能完成(这超过一年!)。我们决定归档旧系统,只传输没有历史记录的数据。还需要2天。



There are a couple of ways to reindex or rebuild indexes.DBCC DBREINDEX (Transact-SQL)[^]


NB:
Most of the time (but not always) SQL will actually decide on your behalf what parts of the query occur in what order. The first time you run any query the database will actually compile and store it after calculating the most efficient way to run it. As your database grows, the sql compiled query may no longer be the most efficient execution. You can clear the query cache so SQL will recompile any and all queries the next time they are run.
Clearing Cache for SQL Server Performance Testing[^]

Conciser for redesign or future projects:
I write my databases so common text (such as ref codes) are stored in a table on their own with an ID and each table that uses the ref code references the ID. That makes indexing and comparison MUCH more efficient.

I once had to redesign a huge database (that was pretty much upside-down) by doing this. It took 6 months to put it together and redesign the application that used it, but even my very best estimates but a data transfer as reducing data size to 5% but would also take over 9k hours to complete (that's over a year!). We decided to archive the old system and only transfer data with no history. Still took 2 days.


这篇关于如何使用SQL中的连接来优化更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆