表扫描和聚簇索引扫描之间有什么区别? [英] What's the difference between a Table Scan and a Clustered Index Scan?

查看:172
本文介绍了表扫描和聚簇索引扫描之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于表扫描聚集索引扫描基本上扫描表中的所有记录,为什么是群集索引扫描应该更好吗?

Since both a Table Scan and a Clustered Index Scan essentially scan all records in the table, why is a Clustered Index Scan supposedly better?

作为一个例子 - 当有很多记录时,以下内容之间的性能差异是什么?:

As an example - what's the performance difference between the following when there are many records?:

declare @temp table(
    SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp

-----------------------------

declare @temp table(
    RowID int not null identity(1,1) primary key,
    SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp


推荐答案

在没有聚簇索引(堆表)的表中,数据页没有链接在一起 - 因此遍历页面需要查找int o索引分配图

In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.

但是,群集表的在双向链表中链接的数据页 - 使顺序扫描更快一些。当然,作为交换,你有处理在 INSERT UPDATE 上保持数据页顺序的开销,和 DELETE 。但是,堆表需要第二次写入IAM。

A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT, UPDATE, and DELETE. A heap table, however, requires a second write to the IAM.

如果您的查询有一个 RANGE 运算符(例如: SELECT * FROM TABLE WHERE Id BETEEEN 1和100 ),然后一个聚簇表(保证顺序)会更有效 - 因为它可以使用索引页面找到相关的数据页面。堆必须扫描所有行,因为它不能依赖于排序。

If your query has a RANGE operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.

当然,聚集索引可以让你做一个CLUSTERED INDEX SEEK,这很漂亮性能最佳...没有索引的堆总是会导致表扫描。

And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.

所以:


  • 对于选择所有行的示例查询,唯一的区别是聚簇索引维护的双向链表。这应该使您的集群表比具有大量行的堆快一点。

  • For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.

对于带有的查询WHERE 子句可以(至少部分地)被聚集索引满足,因为排序你会提前出来 - 所以你不必扫描整个表。

For a query with a WHERE clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.

对于聚集索引不满意的查询,您甚至可以再次使用,唯一的区别在于顺序扫描的双向链表。在任何一种情况下,你都不是最理想的。

For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.

对于 INSERT UPDATE DELETE 堆可能赢得也可能不赢。堆不必维护顺序,但需要第二次写入IAM。我认为相对性能差异可以忽略不计,但也依赖于数据。

For INSERT, UPDATE, and DELETE a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.

Microsoft有白皮书,它将聚集索引与堆上的等效非聚集索引进行比较(与我上面讨论的不完全相同,但很接近)。他们的结论基本上是在所有表上放置聚簇索引。我会尽力总结他们的结果(再次注意,他们真的在这里将非聚集索引与聚集索引进行比较 - 但我认为它相对可比):

Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):


  • INSERT 性能:由于堆需要第二次写入,聚集索引会增加约3%。

  • UPDATE 性能:由于堆需要第二次查找,聚集索引赢得约8%。

  • DELETE 性能:由于需要进行第二次查找,并且需要从IAM进行第二次删除,因此聚簇索引将赢得大约18%。

  • 单个 SELECT 性能:由于堆需要第二次查找,聚集索引赢得约16%。

  • 范围 SELECT 性能:由于堆的随机排序,聚集索引获胜约29%。

  • 并发 INSERT :由于聚簇索引的页面拆分,堆表在负载下赢了30%。

  • INSERT performance: clustered index wins by about 3% due to the second write needed for a heap.
  • UPDATE performance: clustered index wins by about 8% due to the second lookup needed for a heap.
  • DELETE performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.
  • single SELECT performance: clustered index wins by about 16% due to the second lookup needed for a heap.
  • range SELECT performance: clustered index wins by about 29% due to the random ordering for a heap.
  • concurrent INSERT: heap table wins by 30% under load due to page splits for the clustered index.

这篇关于表扫描和聚簇索引扫描之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆