更新x设置y = null需要很长时间 [英] update x set y = null takes a long time

查看:68
本文介绍了更新x设置y = null需要很长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在工作中,我有一张大桌子(大约300万行,例如40-50列).有时我需要清空一些列,并用新数据填充它们.我没想到的是

At work, I have a large table (some 3 million rows, like 40-50 columns). I sometimes need to empty some of the columns and fill them with new data. What I did not expect is that

UPDATE table1 SET y = null

比用该数据填充该列要花费更多的时间,例如,该数据是在sql查询中从同一表的其他列或从子查询的其他表查询的.不管我一次遍历所有表行(如上面的更新查询中),还是我使用光标逐行遍历表(使用pk)都没有关系.无论是在工作中使用大表还是创建小测试表并将其填充成千上万的测试行都没有关系.将列设置为null总是比用一些动态数据更新列(每行都不同)花费的时间更长(在整个测试中,我遇到的因素是2到10).

takes much more time than filling the column with data which is generated, for example, in the sql query from other columns of the same table or queried from other tables in a subquery. It does not matter if I go through all table rows at once (like in the update query above) or if I use a cursor to go through the table row by row (using the pk). It does not matter if I use the large table at work or if I create a small test table and fill it with some hundredthousands of test-rows. Setting the column to null always takes way longer (Throughout the tests, I encountered factors of 2 to 10) than updating the column with some dynamic data (which is different for each row).

这是什么原因?将列设置为null时,Oracle会做什么?还是-我的推理错误是什么?

Whats the reason for this? What does Oracle do when setting a column to null? Or - what's is my error in reasoning?

感谢您的帮助!

P.S .:我正在使用oracle 11g2,并且使用plsql developer和oracle sql developer都发现了这些结果.

P.S.: I am using oracle 11g2, and found these results using both plsql developer and oracle sql developer.

推荐答案

摘要

我认为更新为null的速度较慢,因为Oracle(错误地)试图利用其存储null的方式,从而导致它频繁地重新组织块中的行(堆块压缩"),从而产生了很多额外的UNDO和REDO.

I think updating to null is slower because Oracle (incorrectly) tries to take advantage of the way it stores nulls, causing it to frequently re-organize the rows in the block ("heap block compress"), creating a lot of extra UNDO and REDO.

null有什么特别之处?

Oracle数据库概念:

如果空值位于带有数据值的列之间,则它们存储在数据库中.在这种情况下,它们需要1个字节来存储列的长度(零).

"Nulls are stored in the database if they fall between columns with data values. In these cases they require 1 byte to store the length of the column (zero).

在行中尾随null不需要存储,因为新行头表示前一行中的其余列为null.例如,如果表的最后三列为空,则不会为这些列存储任何信息.在具有许多列的表中, 最后应定义更可能包含空值的列,以节省磁盘空间."

Trailing nulls in a row require no storage because a new row header signals that the remaining columns in the previous row are null. For example, if the last three columns of a table are null, no information is stored for those columns. In tables with many columns, the columns more likely to contain nulls should be defined last to conserve disk space."

测试

基准化更新非常困难,因为不能仅通过update语句来衡量更新的真实成本.例如,日志开关将 并非每次更新都会发生,并且延迟的块清除将在以后发生.为了准确测试更新,应进行多次运行, 每次运行时都应重新创建对象,而高值和低值应丢弃.

Benchmarking updates is very difficult because the true cost of an update cannot be measured just from the update statement. For example, log switches will not happen with every update, and delayed block cleanout will happen later. To accurately test an update, there should be multiple runs, objects should be recreated for each run, and the high and low values should be discarded.

为简单起见,下面的脚本不会抛出高和低的结果,而只会测试具有单个列的表.但是,无论列数,其数据以及更新哪一列,该问题仍然会发生.

For simplicity the script below does not throw out high and low results, and only tests a table with a single column. But the problem still occurs regardless of the number of columns, their data, and which column is updated.

我使用了 http://www.oracle-developer.net/utilities中的RunStats实用程序. php 来比较更新为空值"和更新为空"的资源消耗.

I used the RunStats utility from http://www.oracle-developer.net/utilities.php to compare the resource consumption of updating-to-a-value with updating-to-a-null.

create table test1(col1 number);

BEGIN
    dbms_output.enable(1000000);

   runstats_pkg.rs_start;

    for i in 1 .. 10 loop
        execute immediate 'drop table test1 purge';
        execute immediate 'create table test1 (col1 number)';
        execute immediate 'insert /*+ append */ into test1 select 1 col1
            from dual connect by level <= 100000';
        commit;
        execute immediate 'update test1 set col1 = 1';
        commit;
    end loop;

   runstats_pkg.rs_pause;
   runstats_pkg.rs_resume;

    for i in 1 .. 10 loop
        execute immediate 'drop table test1 purge';
        execute immediate 'create table test1 (col1 number)';
        execute immediate 'insert /*+ append */ into test1 select 1 col1
            from dual connect by level <= 100000';
        commit;
        execute immediate 'update test1 set col1 = null';
        commit;
    end loop;

   runstats_pkg.rs_stop();
END;
/

结果

有很多区别,我认为这是最相关的四个:

There are dozens of differences, these are the four I think are most relevant:

Type  Name                                 Run1         Run2         Diff
----- ---------------------------- ------------ ------------ ------------
TIMER elapsed time (hsecs)                1,269        4,738        3,469
STAT  heap block compress                     1        2,028        2,027
STAT  undo change vector size        55,855,008  181,387,456  125,532,448
STAT  redo size                     133,260,596  581,641,084  448,380,488

解决方案?

我能想到的唯一可能的解决方案是启用表压缩.压缩表不会发生尾随空存储的窍门. 因此,即使Run2的堆块压缩"数从2028年到23208甚至更高,我猜它实际上也无能为力. 启用表压缩后,两次运行之间的重做,撤消和经过时间几乎相同.

The only possible solution I can think of is to enable table compression. The trailing-null storage trick doesn't happen for compressed tables. So even though the "heap block compress" number gets even higher for Run2, from 2028 to 23208, I guess it doesn't actually do anything. The redo, undo, and elapsed time between the two runs is almost identical with table compression enabled.

但是,表压缩有很多潜在的缺点.更新为null会运行得更快,但是其他所有更新的运行至少会稍慢一些.

However, there are lots of potential downsides to table compression. Updating to a null will run much faster, but every other update will run at least slightly slower.

这篇关于更新x设置y = null需要很长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆