优化索引以在 SQL Server 中排名 [英] Optimizing indices for ranking in SQL Server

查看:63
本文介绍了优化索引以在 SQL Server 中排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个宽表,目前正在尝试优化.该表有 50 列(统计数据),我们最终希望按降序对这些列进行排名.目前有超过 500 万行.

We've got a wide table that we're currently trying to optimize. The table has 50 columns (stats) that we eventually want to rank in descending order. Currently there's well over 5 million rows.

我们正在寻找优化此表的方法,以降低复杂性和提高读取速度.写入速度对我们来说也很重要,但读取更为关键.这些统计信息的排名应该尽可能接近实时,最佳解决方案是根据每个请求快速排名(一直在添加新行,我们希望尽快显示这些行的排名).)

We're looking for ways to optimize this table both in terms of reducing complexity and improving read speed. Write speed is also important to us, but read is more critical. The ranks of these statistics should be as close to real time as possible with an optimal solution being one that ranks quickly on a per request basis (new rows are being added all the time and we want to show ranks for these rows as soon as possible.)

我们目前正在评估垂直表格布局是否会 a.) 性能更高,b.) 更易于使用.

We're currently evaluating whether or not a vertical table layout would be a.) more performant, and b.) easier to work with.

因为被插入的统计数据不一定是明确定义的,如果它们不硬编码到表格中对我们来说更容易(因此优先选择垂直表格结构.)

Because the stats that are being inserted are not necessarily well defined, it's easier for us if they aren't hard coded into the table (hence the preference for a vertical table structure.)

看看我们当前的表结构和查询:

Here's a look at our current table structure and query:

CREATE TABLE Stats 
(
    Id BIGINT PRIMARY KEY NOT NULL,
    UserId INT,
    Name VARCHAR(32) NOT NULL,
    Value DECIMAL(10,4) DEFAULT ((0)) NOT NULL,
    UpdatedAt DATETIME
);

CREATE INDEX Leaderboard__index ON Stats (Name, Value DESC);

SELECT
    Id,
    Name,
    Value,
    RANK() OVER (PARTITION BY Name ORDER BY Value DESC) AS Rank
FROM 
    Stats
ORDER BY 
    Value DESC

通常,我们要么搜索任何给定统计数据(如排行榜)的前 N ​​行,要么选择一个 UserId 并获取与该 UserId 相关联的所有统计数据的排名.

Typically we'd either be searching for top N rows for any given stat (like a leaderboard), or we'd be selecting a single UserId and getting the rank of all stats associated with that UserId.

数据相当大(正如我上面提到的,因为有很多行和很多列,垂直表结构可能在 2.5 亿行的范围内,并且会继续增长.)

The data is of considerable size (as I mentioned above, because there's a lot of rows and a lot of columns, a vertical table structure might be in the range of 250 million rows and will continue to grow.)

我们希望在任何需要的硬件上尽可能快地获取这些数据,秒是我们的目标,因为我们目前处于分钟范围内.

We're looking to fetch this data as fast as possible on whatever hardware is required, seconds is our target, as we're currently in the minutes range.

在垂直表结构的测试中,我们插入了超过 400,000 行数据,上面的查询用时不到 3 分钟(尽管对 10,000 行进行排名也只用了大约 18 秒.)

In a test of the vertical table structure we've inserted over 400,000 rows of data and the query above takes a little less than 3 minutes (though it also only took about 18 seconds less to rank 10,000 rows.)

我很想听听任何建议.感谢您的时间!

I'd love to hear any suggestions. Thanks for your time!

推荐答案

你的索引对你的窗口函数没有用,因为

The index you have is not usefull for your window function because

1.为了获取 ID 列值,SQL 可能最终会进行键查找,甚至如果它与 临界点.所以你的索引可能根本没有使用.

1.To get ID column value, SQL may end up doing key lookups or even end up scanning whole other index if it crosses Tipping point.So your index may not be used at all.

2.您按 val desc 排序,这需要一个没有合适索引的排序,甚至可能以 溢出到 TEMPDB

2.You are ordering by val desc which requires a sort with no suitable index and may even endup spilling to TEMPDB

3.关于一个更有趣的碎片方面,见下文

3.For one more interesting fragmenation aspect ,see below

通常,要使 Window 函数运行良好,您需要一个 POC 索引,这意味着

Typically for a Window function to perform well,you will need a POCindex which means

P,O--分区和按列排序应该在key子句中
C--covering --columns 你在 select 中包含的应该被包含

P,O--Partition and order by columns should be in key clause
C--covering --columns you are including in select should be included

为了让下面的查询以最佳方式工作.

So for below query to work optimally.

SELECT
    Id,
    Name,
    Value,
    RANK() OVER (PARTITION BY Name ORDER BY Value DESC) AS Rank
FROM 
    Stats
ORDER BY 
    Value DESC

您将需要以下索引

create index nci_test on dbo.table(name,value desc)
include(id)

使用value desc"创建的索引还有一个问题.

There is one more issue with your index created with " value desc".

通常在一个索引中,所有值默认都按升序存储,但是使用此索引,您要求以相反的方式存储,这可能会导致逻辑碎片,这可以从 answerMartin Smith 在这里..粘贴相关此处的答案中的术语...

Normally in an index all values will be stored in Ascending order by default,but with this index you are asking to store in a reverse way which can cause logical fragmentation which can be seen from answer of Martin Smith here ..Pasting relevant terms from the answer here ...

如果索引是用降序键创建的,但新行附加了升序键值,那么你最终会得到每一页的逻辑顺序.这会严重影响扫描表时的 IO 读取大小,并且它不在缓存中

If the index is created with keys descending but new rows are appended with ascending key values then you can end up with every page out of logical order. This can severely impact the size of the IO reads when scanning the table and it is not in cache

选择太少了..

1.根据您的频率运行索引重建,看看它是否有帮助

1.Run the index rebuild based on your frequency to see if it helps

2.将查询更改为 order by partition 子句将消除使用val desc"选项创建索引的需要

2.Changing the query to order by partition clause will eliminate the need for index to be created with "val desc" option

SELECT
        Id,
        Name,
        Value,
        RANK() OVER (PARTITION BY Name ORDER BY Value DESC) AS Rank
    FROM 
        Stats
    ORDER BY 
        name DESC

上面的查询不需要像你创建的那样创建一个索引.你可以像下面那样改变它..这也处理了上面提到的碎片方面

The above query doesnt need an index to be created like the one you created .you can change it like below..which also takes care of Fragmentation aspects noted above

CREATE INDEX Leaderboard__index ON Stats (Name, Value)
include(id);

参考:
Microsoft SQL Server 2012 使用窗口函数的高性能 T-SQL

这篇关于优化索引以在 SQL Server 中排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆