为什么索引不用于此查询? [英] Why isn't index used for this query?

查看:111
本文介绍了为什么索引不用于此查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个查询,当我想到它时没有使用索引,所以我出于好奇再现:

I had a query where an index was not used when I thought it could be, so I reproduced it out of curiosity:

创建一个 test_table ,包含1.000.000行( col 中的10个不同值,中的500字节数> some_data )。

Create a test_table with 1.000.000 rows (10 distinct values in col, 500 bytes of data in some_data).

CREATE TABLE test_table AS (
  SELECT MOD(ROWNUM,10) col, LPAD('x', 500, 'x') some_data
  FROM dual
  CONNECT BY ROWNUM <= 1000000
);

创建索引并收集表统计信息:

Create an index and gather table stats:

CREATE INDEX test_index ON test_table ( col );

EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );

尝试获取 col 的不同值, COUNT

Try to get distinct values of col and the COUNT:

EXPLAIN PLAN FOR
  SELECT col, COUNT(*)
  FROM test_table
  GROUP BY col;

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |    10 |    30 | 15816   (1)| 00:03:10 
|   1 |  HASH GROUP BY     |            |    10 |    30 | 15816   (1)| 00:03:10 
|   2 |   TABLE ACCESS FULL| TEST_TABLE |   994K|  2914K| 15755   (1)| 00:03:10 
--------------------------------------------------------------------------------- 

如果提示没有改变,则不使用索引。

The index is not used, providing the hint does not change this.

我猜,在这种情况下不能使用索引,但是为什么?

I guess, the index can't be used in this case, but why?

推荐答案

我跑了Peter的原创内容并复制了他的结果。然后我应用了dcp的建议......

I ran Peter's original stuff and reproduced his results. I then applied dcp's suggestion...

SQL> alter table test_table modify col not null;

Table altered.

SQL> EXEC dbms_stats.gather_table_stats( user, 'TEST_TABLE' , cascade=>true)

PL/SQL procedure successfully completed.

SQL> EXPLAIN PLAN FOR
  2    SELECT col, COUNT(*)
  3    FROM test_table
  4    GROUP BY col;

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2099921975

------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |    10 |    30 |   574   (9)| 00:00:07 |
|   1 |  HASH GROUP BY        |            |    10 |    30 |   574   (9)| 00:00:07 |
|   2 |   INDEX FAST FULL SCAN| TEST_INDEX |  1000K|  2929K|   532   (2)| 00:00:07 |
------------------------------------------------------------------------------------

9 rows selected.

SQL>

重要的原因是因为NULL值不包含在普通的B-TREE索引中,但是GROUP BY必须在查询中包含NULL作为分组值。通过告诉优化器 col 中没有NULL,它可以自由地使用效率更高的索引(我使用FTS获得了大约3.55秒的经过时间)。这是元数据如何影响优化器的典型示例。

The reason this matters, is because NULL values are not included in a normal B-TREE index, but the GROUP BY has to include NULL as a grouping "value" in your query. By telling the optimizer that there are no NULLs in col it is free to use the much more efficient index (I was getting an elapsed time of almost 3.55 seconds with the FTS). This is a classic example of how metadata can influence the optimizer.

顺便提一下,这显然是10g或11g数据库,因为它使用HASH GROUP BY算法,而不是较旧的SORT(GROUP BY)算法。

Incidentally, this is obviously a 10g or 11g database, because it uses the HASH GROUP BY algorithm, instead of the older SORT (GROUP BY) algorithm.

这篇关于为什么索引不用于此查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆