SQL ROW_NUMBER() 性能问题 [英] SQL ROW_NUMBER() over performance problem

查看:80
本文介绍了SQL ROW_NUMBER() 性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个运行良好的 SQL.

I have this SQL that works fine.

希望我的过滤器返回具有最高 UserSessionSequenceID 的最新唯一 SessionGuid.

Want the my filter to return the LATEST unique SessionGuids with the highest UserSessionSequenceID.

问题是性能很差 - 即使我有很好的索引.我该如何重写 - 以省略 ROW_NUMBER 行?

Problem is performance sucks - even though I have good indexes. How can I rewrite this - to omit the ROW_NUMBER line?

SELECT TOP(@resultCount) * FROM 
(
    SELECT
        [UserSessionSequenceID]
        ,[SessionGuid]
        ,[IP]
        ,[Url]
        ,[UrlTitle]
        ,[SiteID]
        ,[BrowserWidth]
        ,[BrowserHeight]
        ,[Browser]
        ,[BrowserVersion]
        ,[Referer]
        ,[Timestamp]
        ,ROW_NUMBER() over (PARTITION BY [SessionGuid] 
                                    ORDER BY UserSessionSequenceID DESC) AS sort 
   FROM [tblSequence]
) AS t     
WHERE ([Timestamp] > DATEADD(mi, -@minutes, GETDATE())) 
  AND (SiteID = @siteID) 
  AND sort = 1
ORDER BY [UserSessionSequenceID] DESC

非常感谢:-)

推荐答案

即使我有很好的索引

even though I have good indexes

无意冒犯,但让我们来判断一下.在询问 SQL Server 性能问题时,始终发布表的精确架构,包括所有索引和基数.

No offense, but let us be the judge of that. Always post the exact schema for your tables, including all indexes and cardinalities, when asking SQL Server performance questions.

例如,让我们考虑以下表结构:

For example, lets consider the following table structure:

create table tblSequence (
 [UserSessionSequenceID] int not null
        ,[SessionGuid] uniqueidentifier not null
        ,[SiteID] int not null
        ,[Timestamp] datetime not null
        , filler varchar(512));
go

create clustered index cdxSequence on tblSequence (SiteID, [Timestamp]);
go

与您的相同,但所有与性能问题无关的字段都聚合到通用填充符中.让我们看看,大约 5 万个会话的 100 万行的性能有多差?让我们用随机数据填充表格,但我们将模拟用户活动"的数量:

which is same as yours, but all fields not relevant to the performance problems are aggregated into a generic filler. Lets see, how bad is the performance on, say, 1M rows for about 50k sessions? Lets fill up the table with random data, but we'll simulate what ammounts to 'user activity':

set nocount on;
declare @i int = 0, @sc int = 1;
declare @SessionGuid uniqueidentifier = newid()
    , @siteID int = 1
    , @Timestamp datetime = dateadd(day, rand()*1000, '20070101')
    , @UserSessionSequenceID int = 0;
begin tran;
while @i<1000000
begin
    insert into tblSequence (
        [UserSessionSequenceID]
        ,[SessionGuid]
        ,[SiteID]
        ,[Timestamp]
        , filler)
    values (
        @UserSessionSequenceID
        , @SessionGuid
        , @siteID
        , @timestamp
        , replicate('X', rand()*512));

    if rand()*100 < 5
    begin
        set @SessionGuid = newid();
        set @siteID = rand() * 10;
        set @Timestamp = dateadd(day, rand()*1000, '20070101');
        set @UserSessionSequenceID = 0;
        set @sc += 1;
    end
    else
    begin
        set @timestamp = dateadd(second, rand()*300, @timestamp);
        set @UserSessionSequenceID += 1;
    end

    set @i += 1;
    if (@i % 1000) = 0
    begin
        raiserror(N'Inserted %i rows, %i sessions', 0, 1, @i, @sc);
        commit;
        begin tran;
    end
end
commit;

这大约需要 1 分钟才能填满.现在让我们查询您提出的相同查询:在过去 Y 分钟内,站点 X 上的任何用户会话的最后一个操作是什么?我必须为@now 使用特定日期而不是 GETDATE() 因为 emy dtaa 是模拟的,而不是真实的,所以我使用的是为 SiteId 1 随机填充的任何最大时间戳:

This takes about 1 minute to fill up. Now lets query the same query you asked: what is the last action of any user session on site X in the last Y minutes? I'll have to use a specific date for @now instead of GETDATE() becaus emy dtaa is simulated, not real, so I'm using whatever max timestamp was filled in randomly for SiteId 1:

set statistics time on;
set statistics io on;

declare @resultCount int = 30;
declare @minutes int = 60*24;
declare @siteID int = 1;
declare @now datetime = '2009-09-26 02:08:27.000';

SELECT TOP(@resultCount) * FROM  
( 
    SELECT 
        [UserSessionSequenceID] 
        ,[SessionGuid] 
        , SiteID
        , Filler
        ,[Timestamp] 
        ,ROW_NUMBER() over (PARTITION BY [SessionGuid]  
                                    ORDER BY UserSessionSequenceID DESC) AS sort  
   FROM [tblSequence] 
   where SiteID = @siteID
   and [Timestamp] > DATEADD(mi, -@minutes, @now)
) AS t      
WHERE sort = 1 
ORDER BY [UserSessionSequenceID] DESC ;

这与您的查询相同,但限制性过滤器被移动到 内部 ROW_NUMBER() 部分子查询.结果回来了:

This is same query as yours, but the restrictive filters are moved inside the ROW_NUMBER() part subquery. The results come back in:

Table 'tblSequence'. Scan count 1, logical reads 12, physical reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 31 ms.

热缓存的响应时间为 31 毫秒,从表的近 60k 页中读取了 12 页.

31 ms response time on a warm cache, 12 pages read out of the nearly 60k pages of the table.

更新

再次阅读原始查询后,我意识到我修改后的查询是不同的.您只需要会话.我仍然认为通过 SiteID 和 Timestmap 过滤掉是获得必要性能的唯一方法,因此解决方案是使用 NOT EXISTS 条件验证候选结果:

After reading again original query I realize my modified query is different. You only need new sessions. I still believe that the filtering out by SiteID and Timestmap is the only way to get the necessary performance, so the solution is to validate the candidate finds with a NOT EXISTS condition:

SELECT TOP(@resultCount) * FROM  
( 
    SELECT 
        [UserSessionSequenceID] 
        ,[SessionGuid] 
        , SiteID
        , Filler
        ,[Timestamp] 
        ,ROW_NUMBER() over (
            PARTITION BY [SessionGuid]  
            ORDER BY UserSessionSequenceID DESC) 
         AS sort  
   FROM [tblSequence] 
   where SiteID = @siteID
   and [Timestamp] > DATEADD(mi, -@minutes, @now)
) AS new
WHERE sort = 1 
and not exists (
    select SessionGuid 
    from tblSequence
    where SiteID = @siteID
    and SessionGuid = new.SessionGuid
    and [TimeStamp] < DATEADD(mi, -@minutes, @now)
)
ORDER BY [UserSessionSequenceID] DESC 

这在我的笔记本电脑上返回,在 40 毫秒内从热缓存中返回超过 40 万个会话的 100 万行:

This returns on my laptop, for 1M rows over 400k sessions in 40 ms from a warm cache:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'tblSequence'. Scan count 2, logical reads 709, physical reads 0

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 40 ms.

这篇关于SQL ROW_NUMBER() 性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆