这个 Sql 语句可以重构为不使用 RANK/PARTITION 吗? [英] Can this Sql statement be refactored to NOT use RANK/PARTITION?

查看:34
本文介绍了这个 Sql 语句可以重构为不使用 RANK/PARTITION 吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 sql 语句,它运行良好.我希望看看如何重构它,以便它不需要使用 RANK/PARTITION ... 如果可能的话.

I have the following sql statement, which works perfectly fine. I was hoping to see how this could be refactored so it doesn't require the use of RANK/PARTITION ... if possible.

SELECT LogEntryId, FileId, CreatedOn, EventTypeId
FROM (SELECT a.LogEntryId, a.FileId, a.CreatedOn,  a.EventTypeId, 
        RANK() OVER (PARTITION BY ClientName ORDER BY a.CreatedOn DESC) AS MostRecentEventRank
    FROM LogEntries a
    WHERE (a.EventTypeId = 2 or a.EventTypeId = 4)) SubQuery
WHERE MostRecentEventRank = 1

它想做什么?

  1. 获取表中的所有记录,按客户名称分组,然后按最近创建的顺序排序.
  2. 仅按事件类型 #2(连接)或 #4(断开连接)过滤.
  3. 现在,对于每个客户名称,检索最近的记录.

这实际上是为表中的每个唯一用户获取最近的事件(连接或断开连接).

this in effect is grabbing the most recent event (for a connection or disconnection), for each unique user in the table.

我确实喜欢 RANK/PARTITION,但我希望看看是否可以不使用它.

I do like RANK/PARTITION, but i was hoping to see if it's possible to do without using it.

推荐答案

另一种变体:选择客户端,然后使用 CROSS APPLY (.. TOP (1) ... ORDER BY ...) 获取相关条目.

Yet another variation: select the clients, then use CROSS APPLY (.. TOP (1) ... ORDER BY ...) to get the relevant entry.

SELECT c.ClientName,r.LogEntryId, r.FileId, r.CreatedOn,  r.EventTypeId
FROM (
 SELECT DISTINCT ClientName
 FROM LogEntries
 WHERE EventTypeId IN (2,4)) as c
CROSS APPLY (
   SELECT TOP (1) a.LogEntryId, a.FileId, a.CreatedOn,  a.EventTypeId
   FROM LogEntries as a
   WHERE a.ClientName = c.ClientName
   AND a.EventTypeId IN (2,4)
   ORDER BY a.CreatedOn DESC) as r;

更新

在不知道架构的情况下谈论 T-SQL 查询的性能是没有意义的.此查询在适当设计的模式上是完美的,可以满足其需求.由于访问是通过 ClientName 和 CreatedOn 进行的,因此即使是简单的架构也需要考虑到这一点:

To talk about performance on a T-SQL query without knowing the schema is non-sense. This query is perfectly optimal on a properly designed schema for its needs. Since the access is by ClientName and CreatedOn, then even a simplistic schema would need to take this into consideration:

CREATE TABLE LogEntries (
   LogEntryId int identity(1,1),
   FileID int,
   CreatedOn datetime,
   EventTypeID int,
   ClientName varchar(30)
);

create clustered index cdxLogEntries on LogEntries (
    ClientName, CreatedOn DESC);
go

然后让我们加载大约 240 万行的表格:

And lets load the table with some 2.4M rows:

declare @i int;
set @i = 0;

while @i < 1000
begin
    insert into LogEntries (FileId, CreatedOn, EventTypeId, ClientName)
    select cast(rand()*100 as int),
        dateadd(minute, -rand()*10000, getdate()),
        cast(rand() * 5 as int),
        'Client' + cast(@i as varchar(10))
        from master..spt_values;
    set @i = @i+1;
end

使用set statistics io on 我们得到什么时间和IO;设置统计时间;在预热缓存上?

(410 row(s) affected)
Table 'LogEntries'. Scan count 411, logical reads 14354, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 1219 ms,  elapsed time = 1932 ms.

1.9 秒从我的笔记本电脑上的 240 万个条目中获取数据(它已经使用了 4 年,内存为 1Gb).并且在架构设计方面仍有很大的改进空间.将 ClientName 分离到一个规范化的表中,并将来自 LogEntries 的可信外键放入其中将显着减少时间.EntryTypeId IN (2,4) 上的适当过滤索引也有贡献.我们甚至还没有开始探索并行可能性.

1.9 sec to get the data from 2.4M entries on my laptop (which is 4 years old and has 1Gb RAM). And there is still plenty of room for improvement int he schema design. Separate ClientName into a normalized table with a trusted a foreign key from LogEntries into it would reduce the time significantly. Proper filtered indexes on EntryTypeId IN (2,4) also would contribute. We did not even start the exploration of parallelism posibilities.

这是 SQL,性能是在架构的绘图板上获得的,而不是在查询的文本编辑器中获得.

This is SQL, performance is obtained on the drawing board of your schema, not in the text editor of your query.

这篇关于这个 Sql 语句可以重构为不使用 RANK/PARTITION 吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆