Oracle为什么要使用DBMS_STATS.GATHER_TABLE_STATS? [英] Why does Oracle use DBMS_STATS.GATHER_TABLE_STATS?

查看:143
本文介绍了Oracle为什么要使用DBMS_STATS.GATHER_TABLE_STATS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找到了一些文档,说明Oracle使用这些文档进行性能调整等,但并不太了解它的实际作用.

I found documents explaining that Oracle uses these for performance tuning etc, but didn't quite understand what does it actually do.

有人能用非常简单的例子向我解释一下吗?

Can someone explain it to me in simple words with very basic example?

推荐答案

包括Oracle在内的大多数企业数据库都使用基于成本的优化器来确定给定SQL语句的适当查询计划.这意味着优化器使用有关数据的信息来确定如何执行查询,而不是依赖规则(这是较早的基于规则的优化器所做的事情).

Most enterprise databases, Oracle included, use a cost-based optimizer to determine the appropriate query plan for a given SQL statement. This means that the optimizer uses information about the data to determine how to execute a query rather than relying on rules (this is what the older rule-based optimizer did).

例如,想象一个用于简单的错误跟踪应用程序的表

For example, imagine a table for a simple bug-tracking application

CREATE TABLE issues (
  issue_id number primary key,
  issue_text clob,
  issue_status varchar2(10)
);

CREATE INDEX idx_issue_status
    ON issues( issue_status );

如果我是一家大公司,则此表中可能有100万行.其中,100的issue_status为活动状态,10,000的issue_status为QUEUED,989,900的状态为COMPLETE.如果要对表运行查询以查找我的活动问题

If I'm a large company, I might have 1 million rows in this table. Of those, 100 have an issue_status of ACTIVE, 10,000 have an issue_status of QUEUED, and 989,900 have a status of COMPLETE. If I want to run a query against the table to find my active issues

SELECT *
  FROM issues
 WHERE issue_status = 'ACTIVE'

优化器可以选择.它可以使用issue_status上的索引,然后在表中为匹配的索引中的每一行进行单行查找,或者可以对issues表进行表扫描.哪种计划更有效取决于表中的数据.如果Oracle希望查询返回表中数据的一小部分,则使用索引会更有效.如果Oracle希望查询返回表中大部分数据,则表扫描会更有效.

the optimizer has a choice. It can either use the index on issue_status and then do a single-row lookup in the table for each row in the index that matches or it can do a table scan on the issues table. Which plan is more efficient will depend on the data that is in the table. If Oracle expects the query to return a small fraction of the data in the table, using the index would be more efficient. If Oracle expects the query to return a substantial fraction of the data in the table, a table scan would be more efficient.

DBMS_STATS.GATHER_TABLE_STATS是收集允许Oracle进行此确定的统计信息的工具.它告诉Oracle表中大约有100万行,issue_status列有3个不同的值,并且数据分布不均.因此,Oracle知道使用查询索引来查找所有活动问题.但它也知道,当您转过身来尝试查找所有已解决的问题时

DBMS_STATS.GATHER_TABLE_STATS is what gathers the statistics that allow Oracle to make this determination. It tells Oracle that there are roughly 1 million rows in the table, that there are 3 distinct values for the issue_status column, and that the data is unevenly distributed. So Oracle knows to use an index for the query to find all the active issues. But it also knows that when you turn around and try to look for all the closed issues

SELECT *
  FROM issues
 WHERE issue_status = 'CLOSED'

进行表扫描将更加高效.

that it will be more efficient to do a table scan.

通过收集统计信息,查询计划可以随着数据量和数据分布的变化而随时间变化.首次安装问题跟踪器时,将只有很少的COMPLETED问题和更多的ACTIVE和QUEUED问题.随着时间的流逝,已完成"问题的数量会迅速增加.当您在表中获得更多的行,并且处于各种状态的那些行的相对比例发生变化时,查询计划也将发生变化,以便在理想情况下,您始终可以获得最有效的计划.

Gathering statistics allows the query plans to change over time as the data volumes and data distributions change. When you first install the issue tracker, you'll have very few COMPLETED issues and more ACTIVE and QUEUED issues. Over time, the number of COMPLETED issues rises much more quickly. As you get more rows in the table and the relative fraction of those rows that are in the various statuses change, the query plans will change so that, in the ideal world, you always get the most efficient plan possible.

这篇关于Oracle为什么要使用DBMS_STATS.GATHER_TABLE_STATS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆