在列上做聚集索引保证返回根据该列排序的行 [英] Do clustered index on a column GUARANTEES returning sorted rows according to that column

查看:27
本文介绍了在列上做聚集索引保证返回根据该列排序的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于这个有争议的问题,我无法得到明确的答案.MSDN 文档提及

集群

  • 聚集索引排序并将数据行存储在表或视图中基于他们的关键价值.这些是包含在索引定义.每个表只能有一个聚集索引,因为数据行本身只能以一种顺序排序.

  • 表中的数据行以排序顺序存储的唯一时间是当表包含聚集索引时.当一张桌子有一个聚集索引,该表称为聚集表.如果一张桌子没有聚集索引,它的数据行存储在一个无序的称为堆的结构.

虽然我看到了大部分答案

回答是否定的.

这是什么?

解决方案

只是为了清楚.据推测,您正在谈论一个简单的查询,例如:

选择 *从表 t;

首先,如果表上的所有数据都放在一个页面上,并且表上没有其他索引,我很难想象结果集不是按主键排序的场景.然而,这是因为我认为最合理的查询计划需要全表扫描,而不是因为 SQL 或 SQL Server 中的任何要求——记录或其他要求.如果没有明确的order by,结果集中的排序是查询计划的结果.

这就触及了问题的核心.当您谈论结果集的排序时,您实际上是在谈论查询计划.而且,按主键排序的假设实际上意味着您假设查询使用全表扫描.具有讽刺意味的是,人们在没有真正理解为什么"的情况下做出假设.此外,人们倾向于从小例子中进行概括(好吧,这是人类智能基础的一部分).不幸的是,他们始终认为,对小表进行简单查询的结果集始终按主键顺序排列,并可以推广到大表.本例中的归纳步骤不正确.

什么可以改变这一点?顺便说一句,我认为如果满足以下条件,全表扫描将按主键顺序返回数据:

  • 单线程服务器.
  • 单个文件文件组
  • 没有竞争索引
  • 没有表分区

我并不是说这总是正确的.在这种情况下,这样的查询将使用从表的开头开始的全表扫描似乎是合理的.

即使在一张小桌子上,你也能得到惊喜.考虑:

选择 NonPrimaryKeyColumn从表

查询计划可能会决定在 table(NonPrimaryKeyColumn) 上使用索引,而不是进行全表扫描.结果不会按主键排序(除非意外).我展示这个例子是因为索引可以用于多种目的,而不仅仅是 order bywhere 过滤.

如果您使用数据库的多线程实例并且您有合理大小的表,您将很快了解到没有 order by 的结果没有明确的排序.

最后,SQL Server 有一个非常聪明的优化器.我认为在查询中使用 order by 有点不情愿,因为用户认为它会自动进行排序.SQL Server 努力为查询找到最佳执行计划.如果它认识到 order by 由于计划的其余部分是多余的,那么 order by 将不会导致排序.

而且,当然要保证结果的排序,在最外层的查询中需要order by.即使是这样的查询:

选择 *from (选择前 100 t.* from t order by col1) t

不保证结果在最终结果集中排序.你真的需要这样做:

选择 *from (选择前 100 t.* from t order by col1) t按 col1 排序;

以特定顺序保证结果.此行为中记录在此处.>

I am unable to get clear cut answers on this contentious question . MSDN documentation mentions

Clustered

  • Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.

  • The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.

While I see most of the answers

answering negative.

What is it ?

解决方案

Just to be clear. Presumably, you are talking about a simple query such as:

select *
from table t;

First, if all the data on the table fits on a single page and there are no other indexes on the table, it is hard for me to imagine a scenario where the result set is not ordered by the primary key. However, this is because I think the most reasonable query plan would require a full-table scan, not because of any requirement -- documented or otherwise -- in SQL or SQL Server. Without an explicit order by, the ordering in the result set is a consequence of the query plan.

That gets to the heart of the issue. When you are talking about the ordering of the result sets, you are really talking about the query plan. And, the assumption of ordering by the primary key really means that you are assuming that the query uses full-table scan. What is ironic is that people make the assumption, without actually understanding the "why". Furthermore, people have a tendency to generalize from small examples (okay, this is part of the basis of human intelligence). Unfortunately, they see consistently that results sets from simple queries on small tables are always in primary key order and generalize to larger tables. The induction step is incorrect in this example.

What can change this? Off-hand, I think that a full table scan would return the data in primary key order if the following conditions are met:

  • Single threaded server.
  • Single file filegroup
  • No competing indexes
  • No table partitions

I'm not saying this is always true. It just seems reasonable that under these circumstances such a query would use a full table scan starting at the beginning of the table.

Even on a small table, you can get surprises. Consider:

select NonPrimaryKeyColumn
from table

The query plan would probably decide to use an index on table(NonPrimaryKeyColumn) rather than doing a full table scan. The results would not be ordered by the primary key (unless by accident). I show this example because indexes can be used for a variety of purposes, not just order by or where filtering.

If you use a multi-threaded instance of the database and you have reasonably sized tables, you will quickly learn that results without an order by have no explicit ordering.

And finally, SQL Server has a pretty smart optimizer. I think there is some reluctance to use order by in a query because users think it will automatically do a sort. SQL Server works hard to find the best execution plan for the query. IF it recognizes that the order by is redundant because of the rest of the plan, then the order by will not result in a sort.

And, of course you want to guarantee the ordering of results, you need order by in the outermost query. Even a query like this:

select *
from (select top 100 t.* from t order by col1) t

Does not guarantee that the results are ordered in the final result set. You really need to do:

select *
from (select top 100 t.* from t order by col1) t
order by col1;

to guarantee the results in a particular order. This behavior is documented here.

这篇关于在列上做聚集索引保证返回根据该列排序的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆