即使 x 远高于所选行,使用 SELECT TOP x 的 SQL 巨大性能差异 [英] SQL massive performance difference using SELECT TOP x even when x is much higher than selected rows

查看:29
本文介绍了即使 x 远高于所选行,使用 SELECT TOP x 的 SQL 巨大性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从表值函数中选择一些行,但通过在查询中放置 SELECT TOP 发现了莫名其妙的巨大性能差异.

SELECT col1, col2, col3 etc从 dbo.some_table_functionWHERE col1 = @parameter--按col1排序

需要 5 或 6 分钟以上才能完成.

不过

SELECT TOP 6000 col1, col2, col3 etc从 dbo.some_table_functionWHERE col1 = @parameter--按col1排序

在大约 4 或 5 秒内完成.

如果返回的数据集很大,这不会让我感到惊讶,但所涉及的特定查询返回 200,000 行中的约 5000 行.

因此,在这两种情况下,都会处理整个表,因为 SQL Server 会继续搜索到 6000 行,而这将永远不会到达.为什么会有巨大的差异呢?这是否与 SQL Server 分配空间以预期结果集大小的方式有关(TOP 6000 从而使其要求较低,更容易在内存中分配)?有没有其他人目睹过这样的事情?

谢谢

解决方案

表值函数可以具有非线性执行时间.

让我们考虑这个查询的等效函数:

SELECT (选择总和(mi.value)FROM mytable mi其中 mi.id <= mo.id)FROM mytable mo订购者价值

这个查询(计算正在运行的 SUM)在开始时很快,在结束时很慢,因为在 mo 的每一行上,它应该对所有前面的值求和这需要回绕行源.

每行计算SUM所用的时间随着行数的增加而增加.

如果您使 mytable 足够大(例如,100,000 行,如您的示例所示)并运行此查询,您会发现它需要相当长的时间.

但是,如果您将 TOP 5000 应用于此查询,您会发现它的完成速度比完整表所需时间的 1/20 快得多.>

很可能,您的情况也会发生类似的情况.

更确切地说,我需要查看函数定义.

更新:

SQL Server 可以将谓词推送到函数中.

例如,我刚刚创建了这个TVF:

创建函数 fn_test()退货表作为返回  (选择  *从主);

这些查询:

SELECT *从 fn_test()WHERE 名称 = @name选择前 1000 名 *从 fn_test()WHERE 名称 = @name

产生不同的执行计划(第一个使用集群扫描,第二个使用带有TOP的索引查找)

I'm selecting some rows from a table valued function but have found an inexplicable massive performance difference by putting SELECT TOP in the query.

SELECT   col1, col2, col3 etc
FROM     dbo.some_table_function
WHERE    col1 = @parameter
--ORDER BY col1

is taking upwards of 5 or 6 mins to complete.

However

SELECT   TOP 6000 col1, col2, col3 etc
FROM     dbo.some_table_function
WHERE    col1 = @parameter
--ORDER BY col1

completes in about 4 or 5 seconds.

This wouldn't surprise me if the returned set of data were huge, but the particular query involved returns ~5000 rows out of 200,000.

So in both cases, the whole of the table is processed, as SQL Server continues to the end in search of 6000 rows which it will never get to. Why the massive difference then? Is this something to do with the way SQL Server allocates space in anticipation of the result set size (the TOP 6000 thereby giving it a low requirement which is more easily allocated in memory)? Has anyone else witnessed something like this?

Thanks

解决方案

Table valued functions can have a non-linear execution time.

Let's consider function equivalent for this query:

SELECT  (
        SELECT  SUM(mi.value)
        FROM    mytable mi
        WHERE   mi.id <= mo.id
        )
FROM    mytable mo
ORDER BY
        mo.value

This query (that calculates the running SUM) is fast at the beginning and slow at the end, since on each row from mo it should sum all the preceding values which requires rewinding the rowsource.

Time taken to calculate SUM for each row increases as the row numbers increase.

If you make mytable large enough (say, 100,000 rows, as in your example) and run this query you will see that it takes considerable time.

However, if you apply TOP 5000 to this query you will see that it completes much faster than 1/20 of the time required for the full table.

Most probably, something similar happens in your case too.

To say something more definitely, I need to see the function definition.

Update:

SQL Server can push predicates into the function.

For instance, I just created this TVF:

CREATE FUNCTION fn_test()
RETURNS TABLE
AS
RETURN  (
        SELECT  *
        FROM    master
        );

These queries:

SELECT  *
FROM    fn_test()
WHERE   name = @name

SELECT  TOP 1000 *
FROM    fn_test()
WHERE   name = @name

yield different execution plans (the first one uses clustered scan, the second one uses an index seek with a TOP)

这篇关于即使 x 远高于所选行,使用 SELECT TOP x 的 SQL 巨大性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆