sql server 如何对你的数据进行排序? [英] How does sql server sort your data?

查看:23
本文介绍了sql server 如何对你的数据进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道 sql server 如何对其数据进行排序.我注意到,如果我有一个不包含Id"列的表,并且您选择没有ORDER BY"的数据,sql server 不会自动对主列进行排序.

I was wondering how sql server sorts it's data. I noticed that if I have a table that doesn't contain the column "Id" and you select data without "ORDER BY" sql server doesn't automatically sort on the primary column.

有谁知道sql server按照什么规则对数据进行排序?

Does anyone know what rule sql server follows to sort it's data?

推荐答案

虽然想知道如何解释经常看到相同的顺序是件好事,但我想指出这从来都不是一个好主意依赖于由底层数据库引擎的特定实现引起的隐式顺序.换句话说,很高兴知道为什么,但你永远不应该依赖它.对于 MS SQL,唯一能够以特定顺序可靠地传送行的是显式 ORDER BY 子句.

Although it's good to wonder about how it might be explained that you often see the same order, I'd like to point out that it never a good idea to rely on implicit order caused by the particular implementation of the underlying database engine. In other words, its nice to know why, but you should never ever rely on it. For MS SQL, the only thing that reliably delivers the rows in a certain order, is an explicit ORDER BY clause.

不仅不同的 RDMBS-es 行为不同,一个特定的实例也可能由于更新(补丁)而行为不同.不仅如此,甚至 RDBMS 软件的状态也可能会产生影响:暖"数据库的行为与冷"数据库的行为不同,小表的行为与大表的行为不同.

Not only do different RDMBS-es behave differently, one particular instance may behave differently due to an update (patch). Not only that, even the state of the RDBMS software may have an impact: a "warm" database behaves differently than a "cold" one, a small table behaves differently than a large one.

即使你有关于实现的背景信息(例如:有一个聚集索引,因此数据很可能会按聚集索引的顺序返回"),也总是有可能存在另一种机制您不知道这会导致以不同顺序返回行(例如:如果另一个会话只是使用显式 ORDER BY 进行了全表扫描,结果集可能已被缓存;随后的全扫描将尝试从缓存中返回行"; ex2: "a GROUP BY 可以通过对数据进行排序来实现,从而影响返回行的顺序"; ex3: "如果选择的列都在已经缓存在内存中的二级索引中,引擎可能会扫描二级索引而不是表,最有可能按照二级索引的顺序返回行).

Even if you have background information about the implementation (ex: "there is a clustered index, thus it is likely the data will be returned by order of the clustered index"), there is always a possibility that there is another mechanism you don't know about that causes the rows to be returned in a different order (ex1: "if another session just did a full table scan with an explicit ORDER BY the resultset may have been cached; a subsequent full scan will attempt to return the rows from the cache"; ex2: "a GROUP BY may be implemented by sorting the data, thus impacting the order the rows are returned"; ex3: "If the selected columns are all in a secondary index that is already cached in memory, the engine may scan the secondary index instead of the table, most likely returning the rows by order of the secondary index").

这是一个非常简单的测试,它说明了我的一些观点.

Here's a very simple test that illustrates some of my points.

首先,启动 SQL 服务器(我使用的是 2008).创建此表:

First, startup SQL server (I'm using 2008). Create this table:

create table test_order (
    id int not null identity(1,1) primary key
,   name varchar(10) not null 
)

检查表并证明创建了一个集群索引以支持 id 列上的 primary key.例如,在 sql server management studio 中,您可以使用树视图并导航到表下方的索引文件夹.在那里您应该看到一个索引,其名称类似于:PK__test_ord__3213E83F03317E3D (Clustered)

Examine the table and witness that a clusted index was created to support the primary key on the id column. For example, in sql server management studio, you can use the tree view and navigate to the indexes folder beneath your table. There you should see one index, with a name like: PK__test_ord__3213E83F03317E3D (Clustered)

用这个语句插入第一行:

Insert the first row with this statement:

insert into test_order(name)
select RAND()

通过重复此语句 16 次来插入更多行:

Insert more rows by repeating this statement 16 times:

insert into test_order(name)
select RAND()
from   test_order

您现在应该有 65536 行:

You should now have 65536 rows:

select COUNT(*) 
from   test_order

现在,不使用 order by 选择所有行:

Now, select all rows without using an order by:

select *
from   test_order

很可能,结果将按主键的顺序返回(尽管不能保证).这是我得到的结果(确实是按主键的顺序):

Most likely, the results will be returned by order of the primary key (although there is no guarantee). Here's the result I got (which is indeed by order of primary key):

#      id    name
1      1     0.605831
2      2     0.517251
3      3     0.52326
.      .     .......
65536  65536 0.902214

(# 不是列而是结果中行的序数位置)

(the # is not a column but the ordinal position of the row in the result)

现在,在 name 列上创建二级索引:

Now, create a secondary index on the name column:

create index idx_name on test_order(name)

选择所有行,但只检索name列:

Select all rows, but retrieve only the name column:

select name
from   test_order

结果很可能会按照二级索引idx_name的顺序返回,因为查询可以通过只扫描索引来解析(iowidx_name是一个covering索引).这是我得到的结果,确实是按name的顺序.

Most likely the results will be returned by order of the secondary index idx_name, since the query can be resolved by only scanning the index (i.o.w. idx_name is a covering index). Here's the result I got, which is indeed by order of name.

#      name
1      0.0185732
2      0.0185732
.      .........
65536  0.981894

现在,再次选择所有列和所有行:

Now, select all columns and all rows again:

select * 
from test_order

这是我得到的结果:

#      id    name
1      17    0.0185732
2      18    0.0185732
3      19    0.0185732
...    ..    .........

如您所见,与我们第一次运行此查询时完全不同.(看起来行是按二级索引排序的,但我没有解释为什么会这样).

as you can see, quite different from the first time we ran this query. (It looks like the rows are ordered by the secondary index, but I don't have an explanation why that should be so).

无论如何,底线是 - 不要依赖隐式顺序.您可以考虑解释为什么可以观察到特定顺序,但即便如此,如果没有对实现和运行时状态的深入了解,您也无法始终预测它(如后一种情况).

Anyway, the bottom line is - don't rely on implicit order. You can think of explanations why a particular order can be observed, but even then you can't always predict it (like in the latter case) without having intimate knowledge of implementation and runtime state.

这篇关于sql server 如何对你的数据进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆