DELETE 查询性能 [英] DELETE query performance
本文介绍了DELETE 查询性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
原始查询
delete B from
TABLE_BASE B ,
TABLE_INC I
where B.ID = I.IDID and B.NUM = I.NUM;
上述查询的性能统计
+-------------------+---------+-----------+
| Response Time | SumCPU | ImpactCPU |
+-------------------+---------+-----------+
| 00:05:29.190000 | 2852 | 319672 |
+-------------------+---------+-----------+
<小时>
优化的查询 1
DEL FROM TABLE_BASE WHERE (ID, NUM) IN
(SELECT ID, NUM FROM TABLE_INC);
上述查询的统计数据
+-----------------+--------+-----------+
| QryRespTime | SumCPU | ImpactCPU |
+-----------------+--------+-----------+
| 00:00:00.570000 | 15.42 | 49.92 |
+-----------------+--------+-----------+
<小时>
优化的查询 2
DELETE FROM TABLE_BASE B WHERE EXISTS
(SELECT * FROM TABLE_INC I WHERE B.ID = I.ID AND B.NUM = I.NUM);
上述查询的统计数据
+-----------------+--------+-----------+
| QryRespTime | SumCPU | ImpactCPU |
+-----------------+--------+-----------+
| 00:00:00.400000 | 11.96 | 44.93 |
+-----------------+--------+-----------+
我的问题 -
- 优化后的查询 1 和 2 如何/为什么如此显着地影响性能?
- 此类 DELETE 查询的最佳做法是什么?
- 我应该选择查询 1 还是查询 2?哪一个是理想的/更好的/可靠的?我觉得查询 1 是理想的,因为我使用的是
SELECT ID,NUM
而不是SELECT *
减少到只有两列,但查询 2 显示出更好的结果.
- How/Why does the Optimized Query 1 and 2 significantly affect the performance so much ?
- What is the best practice for such DELETE queries ?
- Should I choose Query 1 or Query 2 ? Which one is ideal/better/reliable? I feel Query 1 would be ideal because instead of
SELECT *
I am usingSELECT ID,NUM
reducing to only two columns but Query 2 is showing better results.
QUERY 1
This query is optimized using type 2 profile T2_Linux64, profileid 21.
1) First, we lock TEMP_DB.TABLE_BASE for write on a
reserved RowHash to prevent global deadlock.
2) Next, we lock TEMP_DB_T.TABLE_INC for access, and we
lock TEMP_DB.TABLE_BASE for write.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from
TEMP_DB.TABLE_BASE by way of an all-rows scan
with no residual conditions into Spool 2 (all_amps), which is
redistributed by the hash code of (
TEMP_DB.TABLE_BASE.NUM,
TEMP_DB.TABLE_BASE.ID) to all AMPs. Then
we do a SORT to order Spool 2 by row hash. The size of Spool
2 is estimated with low confidence to be 168,480 rows (
5,054,400 bytes). The estimated time for this step is 0.03
seconds.
2) We do an all-AMPs RETRIEVE step from
TEMP_DB_T.TABLE_INC by way of an all-rows scan
with no residual conditions into Spool 3 (all_amps), which is
redistributed by the hash code of (
TEMP_DB_T.TABLE_INC.NUM,
TEMP_DB_T.TABLE_INC.ID) to all AMPs. Then
we do a SORT to order Spool 3 by row hash and the sort key in
spool field1 eliminating duplicate rows. The size of Spool 3
is estimated with high confidence to be 5,640 rows (310,200
bytes). The estimated time for this step is 0.03 seconds.
4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using an inclusion
merge join, with a join condition of ("(ID = ID) AND
(NUM = NUM)"). The result goes into Spool 1 (all_amps),
which is redistributed by the hash code of (
TEMP_DB.TABLE_BASE.ROWID) to all AMPs. Then we do
a SORT to order Spool 1 by row hash and the sort key in spool
field1 eliminating duplicate rows. The size of Spool 1 is
estimated with no confidence to be 168,480 rows (3,032,640 bytes).
The estimated time for this step is 1.32 seconds.
5) We do an all-AMPs MERGE DELETE to
TEMP_DB.TABLE_BASE from Spool 1 (Last Use) via the
row id. The size is estimated with no confidence to be 168,480
rows. The estimated time for this step is 42.95 seconds.
6) We spoil the parser's dictionary cache for the table.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
<小时>
QUERY 2 EXPLAIN PLAN
This query is optimized using type 2 profile T2_Linux64, profileid 21.
1) First, we lock TEMP_DB.TABLE_BASE for write on a reserved RowHash to
prevent global deadlock.
2) Next, we lock TEMP_DB_T.TABLE_INC for access, and we
lock TEMP_DB.TABLE_BASE for write.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from TEMP_DB.TABLE_BASE by way of
an all-rows scan with no residual conditions into Spool 2
(all_amps), which is redistributed by the hash code of (
TEMP_DB.TABLE_BASE.NUM, TEMP_DB.TABLE_BASE.ID) to all AMPs.
Then we do a SORT to order Spool 2 by row hash. The size of
Spool 2 is estimated with low confidence to be 168,480 rows (
5,054,400 bytes). The estimated time for this step is 0.03
seconds.
2) We do an all-AMPs RETRIEVE step from
TEMP_DB_T.TABLE_INC by way of an all-rows scan
with no residual conditions into Spool 3 (all_amps), which is
redistributed by the hash code of (
TEMP_DB_T.TABLE_INC.NUM,
TEMP_DB_T.TABLE_INC.ID) to all AMPs. Then
we do a SORT to order Spool 3 by row hash and the sort key in
spool field1 eliminating duplicate rows. The size of Spool 3
is estimated with high confidence to be 5,640 rows (310,200
bytes). The estimated time for this step is 0.03 seconds.
4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using an inclusion
merge join, with a join condition of ("(NUM = NUM) AND
(ID = ID)"). The result goes into Spool 1 (all_amps), which
is redistributed by the hash code of (TEMP_DB.TABLE_BASE.ROWID) to all
AMPs. Then we do a SORT to order Spool 1 by row hash and the sort
key in spool field1 eliminating duplicate rows. The size of Spool
1 is estimated with no confidence to be 168,480 rows (3,032,640
bytes). The estimated time for this step is 1.32 seconds.
5) We do an all-AMPs MERGE DELETE to TEMP_DB.TABLE_BASE from Spool 1 (Last
Use) via the row id. The size is estimated with no confidence to
be 168,480 rows. The estimated time for this step is 42.95
seconds.
6) We spoil the parser's dictionary cache for the table.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
对于 TABLE_BASE
For TABLE_BASE
+----------------+----------+
| table_bytes | skewness |
+----------------+----------+
| 16842085888.00 | 22.78 |
+----------------+----------+
对于 TABLE_INC
For TABLE_INC
+-------------+----------+
| table_bytes | skewness |
+-------------+----------+
| 5317120.00 | 44.52 |
+-------------+----------+
推荐答案
TABLE_BASE
和 TABLE_INC
有什么关系?
如果是一对多 Q1 可能会首先创建一个巨大的线轴,而 Q2&3 可能会在加入之前应用 DISTINCT
.
If it's one-to-many Q1 probably creates a huge spool first while Q2&3 might apply DISTINCT
before the join.
关于 IN
和 EXISTS
应该几乎没有什么区别,你检查 dbc.QryLogStepsV 了吗?
Regarding IN
vs. EXISTS
there should be hardly any difference, did you check dbc.QryLogStepsV?
如果 (ID,Num)
是目标表的 PI,则重写为 MERGE DELETE 应提供最佳性能:
If (ID,Num)
is the PI of the target table rewriting to a MERGE DELETE should provide best performance:
MERGE INTO TABLE_BASE AS tgt
USING TABLE_INC AS src
ON src.ID = tgt.ID,
AND src.Num = tgt.Num
WHEN MATCHED
THE DELETE
这篇关于DELETE 查询性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文