为什么/何时/如何选择整个聚簇索引扫描而不是全表扫描? [英] Why/when/how is whole clustered index scan chosen rather than full table scan?

查看:165
本文介绍了为什么/何时/如何选择整个聚簇索引扫描而不是全表扫描?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

IMO,请更正我...

聚集索引的叶包含实际表行,因此完整的聚集索引,中间叶,包含比完整表更多的数据(?)< br>
为什么/何时/如何在整个表扫描中选择整个聚簇索引扫描?

IMO, please correct me...
the leaf of clustered index contains the real table row, so full clustered index, with intermediate leaves, contain much more data than the full table(?)
Why/when/how is ever whole clustered index scan chosen over the full table scan?

在SELECT查询中使用的CUSTOMER_ID列上的聚集索引如何在SELECT列表或WHERE条件[1]中不包含它?

更新:

我应该明白,全集群扫描比全表扫描要快,因为每个数据页包含指向下一个和以前的叶节点页,所以扫描不需要使用索引中的更高级页面?

是否有其他原因(不参与查询)聚类索引用于排序?

Update:
Should I understand that full clustered scan is faster than full table scan because "Each data page contains pointers to the next and previous leaf node page so the scan does not need to use the higher level pages in the index"?
Are there any other reasons like (non-participating in query) clustered index is used in sorting?

Update2:

由于事后,连续访问不能提高性能,而通过IAM指针加载表可以并行化。

聚集索引扫描是否意味着连续的页面读取?

聚簇表是否意味着缺少IAM指针(不可能进行全表扫描)?

为什么不能使用聚簇表全表扫描?

我仍​​然不明白如何/为什么聚集索引全扫描可以比全表扫描更好。

这是否意味着有聚簇索引可以结果在性能恶化?

Update2:
As afterthought, consecutive access cannot give performance boost while loading table through IAM pointers can be parallelized.
Does clustered index scan imply consecutive page reading?
Does clustered table imply absence of IAM pointers (impossibility of full table scan)?
Why cannot clustered table be full table scanned?
I still do not understand how/why clustered index full scan can be "better" over full table scan.
Does it mean that having clustered index can result in performance worsening?

问题是关于集群表不是堆(非索引)表。

The question is about clustered table not heap (non-indexed) table.

Update3:

完整聚集索引扫描是否与全表扫描
有什么区别?

Update3:
Is "full clustered index scan" really synonym to "full table scan"?
What are differences?

[1]索引覆盖提升SQL Server查询性能

http://www.devx.com/dbzone/Article/29530

[1] Index Covering Boosts SQL Server Query Performance
http://www.devx.com/dbzone/Article/29530

推荐答案

请先阅读我在无法直接访问集群表中的数据行 - 为什么?下的答案。

Please read my answer under "No direct access to data row in clustered table - why?", first.

聚集索引的叶包含实表行,因此具有中间叶的完整聚类索引包含比完整表(?)更多的数据

看到你正在混淆表与存储结构。在你的问题的上下文,例如。考虑CI的大小而不是表,那么你必须考虑CI减去叶级别(这是数据行)。仅CI的索引部分很小。中间级别(如任何B树)包含部分(不是全部)键条目;它排除最低级别,这是全键条目,它位于行本身中,并且不重复。

See you are mixing up "Table" with storage structures. In the context of your question, eg. thinking about the size of the CI as opposed to the "table", well then you must think about the CI minus the leaf level (which is the data row). The CI, index portion only, is tiny. The intermediate levels (like any B-Tree) contain partial (not full) key entries; it excludes the lowest level, which is the full key entry, which sits in the row itself, and is not duplicated.

表(完整CI)可以是10GB。 CI只能是10MB。有一个可怕的很多,可以从10MB确定,而不必去到100GB。

The table (full CI) may be 10GB. The CI only may be 10MB. There is an awful lot that can be determined from the 10MB without having to go to the 100GB.

为了理解:同一表(CI)上的等效NCI可能22MB;如果您删除了CI,则相同表上的等效NCI可以是21.5MB(假设CI键是合理的,不是宽宽的)。

For understanding: the equivalent NCI on the same table (CI) may be 22MB; the equivalent NCI on the same table if you removed the CI may be 21.5MB (assuming the CI key is reasonable, not fat wide).

何时/如何在整个表扫描中选择整个聚簇索引扫描?

很多时候。同样,上下文是,我们正在谈论CI-负叶水平。对于只使用CI中的列的查询,CI中的这些列的存在(实际上是任何索引)允许查询是覆盖的查询,这意味着它可以完全从索引服务,不需要去到数据行。范围扫描部分键:BETWEEN x AND yY; x <= y;

Quite often. Again the context is, we are talking about the CI-minus-Leaf levels. For queries that use only the columns in the CI, the presence of those columns in the CI (any index actually) allow the query to be a "covered query", which means it can by serviced wholly from the index, no need to go to the data rows. Think range scans on partial keys: BETWEEN x AND yY; x <= y; etc.

(总是有机会,优化器将选择一个表扫描,当认为它应该选择一个索引扫描,bu这是一个不同的故事。)

(There is always the chance that the optimiser will choose a table scan, when you think it should choose an index scan, bu t that is a different story.)

我仍然不明白如何/为什么聚集索引全扫描可以更好比全表扫描。

"I still do not understand how/why clustered index full scan can be "better" over full table scan."

(MS使用的术语不如我在这里的答案精确。)对于可以从10MB CI回答的任何查询,我宁愿通过数据缓存来搅拌10MB,比100GB。对于相同的查询,由CI键上的范围限制,这是10MB的一小部分。

(The terms used by MS are less precise than my answers here.) For any query that can be answered from the 10MB CI, I would much rather churn 10MB through the data cache, than 100GB. For the same queries, bounded by a range on the CI key, that's a fraction of the 10MB.

对于需要全表扫描的查询,您必须读取CI的所有Leaf页面,即100GB。

For queries that requires a "full table scan", well yes, you must read all the Leaf pages of the CI, which is the 100GB.

这篇关于为什么/何时/如何选择整个聚簇索引扫描而不是全表扫描?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆