T-SQL 的“ORDER BY RAND()"的官方文档在哪里?和“按 NEWID() 订购"? [英] Where is the official documentation for T-SQL's "ORDER BY RAND()" and "ORDER BY NEWID()"?

查看:51
本文介绍了T-SQL 的“ORDER BY RAND()"的官方文档在哪里?和“按 NEWID() 订购"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找ORDER BY RAND()"和ORDER BY NEWID()"的官方 T-SQL 文档.有很多文章描述它们,因此必须在某处记录它们.

我正在寻找像这样的官方 SQL Server 文档页面的链接: 包含一个明确的ORDER BY NEWID().我怀疑您会发现任何以任何方式记录 ORDER BY RAND() 的官方文件,因为无论是否正式支持,这样做都没有任何意义.

回复:SQL Server 为 随机分配种子值 的注释 - 这不应被解释为 随机分配 **每行** 的种子值.演示:

SELECT MAX(r), MIN(r) FROM(从 sys.all_columns AS s1 中选择 RAND()交叉连接 sys.all_columns AS s2) AS x(r);

结果:

0.4866202638872 0.4866202638872

在我的机器上,这需要大约 15 秒才能运行,MINMAX 的结果总是相同的.不断增加返回的行数和花费的时间,我保证您将继续在每一行上看到 RAND() 完全相同的值.它只计算一次,这并不是因为 SQL Server 对我没有返回所有行的事实很明智.这也产生了相同的结果(用 7200 万行填充整个表只用了不到 2 分钟的时间):

SELECT RAND() AS r INTO #xFROM sys.all_columns AS s1交叉连接 sys.all_columns AS s2交叉连接 sys.all_columns AS s3;SELECT MAX(r), MIN(r) FROM #x;

(实际上,SELECT 花费的时间几乎与初始人口一样长.不要在具有 4GB RAM 的单核笔记本电脑上尝试此操作.)

结果:

0.302690214345828 0.302690214345828

I'm looking for the official T-SQL documentation for "ORDER BY RAND()" and "ORDER BY NEWID()". There are numerous articles describing them, so they must be documented somewhere.

I'm looking for a link to an official SQL Server documentation page like this: http://technet.microsoft.com/en-us/library/ms188385.aspx

CLARIFICATION:

What I'm looking for is the documentation for "order_by_expression" that explains the difference in behavior between a nonnegative integer constant, a function that returns a nonnegative integer, and a function that returns any other value (like RAND() or NEWID()).


ANSWER:

I appologize for the lack of clarity in my original question. As with most programming-related problems, the solution to the problem is primarily figuring out what question you're actually trying to answer.

Thank you everyone.


The answer is in this document: From: http://www.wiscorp.com/sql200n.zip

Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation)

22.2 <direct select statement: multiple rows> includes a <cursor specification>.

At this point we have the first half of the answer:

A SELECT statment is a type of CURSOR, which means that operations can be performed iteratively on each row. Although I haven't found a statement in the docs that explicity says it, I'm content to assume that the expression in the order_by_expression will be executed for each row.

Now it makes sense what is happening when you use RAND() or NEWID() or CEILING(RAND() + .5) / 2 as opposed to a numeric constant or a column name.
The expression will never be treated like a column number. It will always be a value that is generated for each row which will be used as the basis for determining the order of the rows.

However, for thoroughness, let's continue to the full definition of what an expression can be.

14.3 <cursor specification> includes ORDER BY <sort specification list>.

10.10 <sort specification list> defines:

<sort specification> ::= <sort key> [ <ordering specification> ] [ <null ordering> ]
    <sort key> ::= <value expression>
    <ordering specification> ::= ASC | DESC
    <null ordering> ::= NULLS FIRST | NULLS LAST

Which takes us to:

6.25 <value expression>

Where we find the second half of the answer:

<value expression> ::= 
      <common value expression> 
    | <boolean value expression> 
    | <row value expression>

<common value expression> ::= 
      <numeric value expression> 
    | <string value expression>
    | <datetime value expression>
    | <interval value expression>
    | <user-defined type value expression>
    | <reference value expression>
    | <collection value expression>

    <user-defined type value expression> ::= <value expression primary>
    <reference value expression> ::= <value expression primary>
    <collection value expression> ::= <array value expression> | <multiset value expression>

From here we descend into the numerous possibile types of expressions that can be used.

NEWID() returns a uniqueidentifier.
It seems reasonable to assume that uniqueidentifiers are compared numerically, so if expression is NEWID() our <common value expression> will be a <numeric value expression>.

Similarly, RAND() returns a numeric value, and it will also be evaluated as a <numeric value expression>.

So, although I wasn't able to find anything in Microsoft's offical documentation that explains what ORDER BY does when called using an order_by_expression that is an expression, it really is documented, as I knew it must be.

解决方案

If you're trying to determine why these behave differently, the reason is simple: one is evaluated once, and treated as a runtime constant (RAND()), while the other is evaluated for every single row (NEWID()). Observe this simple example:

SELECT TOP (5) RAND(), NEWID() FROM sys.objects;

Results:

0.240705716465209        8D5D2B55-E5DE-4FF9-BA84-BC82F37B8F3A
0.240705716465209        C4CBF1CA-E6D0-4076-B6A6-5048EA612048
0.240705716465209        9BFAE5BB-B5B9-47DE-B8F9-77AAEFA5F9DB
0.240705716465209        89FFD8A1-AC73-4CEB-A5C0-00A76D040382
0.240705716465209        BCC89923-735E-43B3-9ECA-622A8C98AD7D

Now, if you apply an order by to the left column, SQL Server says, ok, but every single value is the same, so I'm basically just to ignore your request and move on to the next ORDER BY column. If there isn't one, then SQL Server will default to returning the rows in whatever order it deems most efficient.

If you apply an order by to the right column, now SQL Server actually has to sort all of the values. This introduces a Sort (or a TopN Sort if TOP is used) operator into the plan, and is likely going to take more CPU (though overall duration may not be substantially affected, depending on the size of the set and other factors).

Let's compare the plans for these two queries:

SELECT RAND() FROM sys.all_columns ORDER BY RAND();

The plan:

There is no sort operator going on, and both of the scans are Ordered = False - this means that SQL Server has not decided to explicitly implement any ordering, but this certainly does not mean that the order will be any different on each execution - it just means that the order is non-deterministic (unless you add a secondary ORDER BY - but even in that case, the RAND() ordering is still ignored because, well, it's the same value on every row).

And now NEWID():

SELECT NEWID() FROM sys.all_columns ORDER BY NEWID();

The plan:

There is a new Sort operator there, which means that SQL Server must reorder all the rows to be returned in the order of the generated GUID values on each row. The scans of course are still unordered, but the Sort ultimately applies the order.

I don't know that this specific implementation detail is officially documented anywhere, though I did find this article which includes an explicit ORDER BY NEWID(). I doubt you'll find anything official that documents ORDER BY RAND() in any way, because that just doesn't make any sense to do, officially supported or not.

Re: the comment that SQL Server assigns a seed value at random - this should not be interpreted as a seed value **per row** at random. Demonstration:

SELECT MAX(r), MIN(r) FROM 
(
  SELECT RAND() FROM sys.all_columns AS s1 
  CROSS JOIN sys.all_columns AS s2
) AS x(r);

Results:

0.4866202638872        0.4866202638872

On my machine, this took about 15 seconds to run, and the results were always the same for both MIN and MAX. Keep increasing the number of rows returned and the amount of time it takes, and I guarantee you will continue to see the exact same value for RAND() on every row. It is calculated exactly once, and that is not because SQL Server is wise to the fact that I am not returning all of the rows. This also yielded the same result (and it took just under 2 minutes to populate the entire table with 72 million rows):

SELECT RAND() AS r INTO #x 
      FROM sys.all_columns AS s1 
CROSS JOIN sys.all_columns AS s2
CROSS JOIN sys.all_columns AS s3;

SELECT MAX(r), MIN(r) FROM #x;

(In fact the SELECT took almost as long as the initial population. Do not try this on a single-core laptop with 4GB of RAM.)

The result:

0.302690214345828        0.302690214345828

这篇关于T-SQL 的“ORDER BY RAND()"的官方文档在哪里?和“按 NEWID() 订购"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆