在 SQL Server 中的同一列上创建多个非聚集索引 [英] Create more than one non clustered index on same column in SQL Server

查看:43
本文介绍了在 SQL Server 中的同一列上创建多个非聚集索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是索引创建策略?

是否可以在 SQL Server 的同一列上创建多个非聚集索引?

如何在同一列上创建集群和非集群?

非常抱歉,索引对我来说很混乱.

有没有办法在 SQL Server 中找出估计的查询执行时间?

解决方案

这些词相当合乎逻辑,你会很快学会.:)

通俗地说,SEEK 意味着寻找记录的精确位置,这是 SQL Server 在您搜索的列被索引时所做的,并且您的过滤器(WHERE 条件)足够准确.

SCAN 意味着更大范围的行,其中查询执行计划器估计获取整个范围比单独查找每个值更快.

是的,你可以在同一个字段上有多个索引,有时这可能是一个很好的主意.使用索引并使用查询执行计划器来确定发生了什么(SSMS 中的快捷方式:Ctrl + M).您甚至可以运行同一查询的两个版本,执行计划器会轻松显示每个版本占用了多少资源和时间,从而使优化变得非常容易.

但稍微扩展一下,假设您有一个像这样的地址表,它有超过 10 亿条记录:

创建表地址(ADDRESS_ID INT -- CLUSTERED 主键 ADRESS_PK_IDX, PERSON_ID INT -- 外键,非聚集索引 ADDRESS_PERSON_IDX, 城市 VARCHAR(256), MARKED_FOR_CHECKUP BIT, **+n^10 个不同的其他列...**)

现在,如果要查找人员 12345 的所有地址信息,PERSON_ID 上的索引是完美的.由于该表在同一行上加载了其他数据,因此创建非聚集索引来覆盖所有其他列以及 PERSON_ID 将是低效和占用空间的.在这种情况下,SQL Server 将对 PERSON_ID 中的索引执行索引 SEEK,然后使用它对 ADDRESS_ID 中的聚集索引进行键查找,并从那里返回同一行上所有其他列中的所有数据.

但是,假设您要搜索城市中的所有人员,但不需要其他地址信息.这一次,最有效的方法是在 CITY 上创建一个索引,并使用 INCLUDE 选项来覆盖 PERSON_ID.这样,单个索引查找/扫描将返回您需要的所有信息,而无需求助于检查同一行上 PERSON_ID 数据的 CLUSTERED 索引.

现在,假设这两个查询都是必需的,但由于有 10 亿条记录,因此仍然相当繁重.但是有一个特殊的查询需要非常快.该查询需要地址为 MARKED_FOR_CHECKUP 且必须居住在纽约的所有人员(忽略任何检查意味着什么,这无关紧要).现在,您可能想要在 MARKED_FOR_CHECKUP 和 CITY 上创建第三个过滤索引,其中 INCLUDE 覆盖 PERSON_ID,并使用一个过滤器说 CITY = 'New York' 和 MARKED_FOR_CHECKUP = 1.这个索引会非常快,因为它只涵盖查询满足这些确切条件的索引,因此与其他索引相比只需要处理一小部分数据.

(此处免责声明,请记住查询执行计划器并不愚蠢,它可以将多个非聚集索引一起使用以产生正确的结果,因此上面的示例可能不是最好的,因为很难想象何时您需要 3 个不同的索引来覆盖同一列,但我相信您明白了.)

索引的类型、它们的列、包含的列、排序顺序、过滤器等完全取决于情况.您将需要创建覆盖索引以满足多种不同类型的查询,以及专门为单一的、重要的查询创建的自定义索引.每个索引都占用 HDD 上的空间,因此创建无用索引是一种浪费,并且每当数据模型更改时都需要额外维护,并且会浪费时间在碎片整理和统计信息更新操作上......

实验、学习并找出最适合您需求的方法.

What is the index creating strategy?

Is it possible to create more than one non-clustered index on the same column in SQL Server?

How about creating clustered and non-clustered on same column?

Very sorry, but indexing is very confusing to me.

Is there any way to find out the estimated query execution time in SQL Server?

解决方案

The words are rather logical and you'll learn them quite quickly. :)

In layman's terms, SEEK implies seeking out precise locations for records, which is what the SQL Server does when the column you're searching in is indexed, and your filter (the WHERE condition) is accurrate enough.

SCAN means a larger range of rows where the query execution planner estimates it's faster to fetch a whole range as opposed to individually seeking each value.

And yes, you can have multiple indexes on the same field, and sometimes it can be a very good idea. Play out with the indexes and use the query execution planner to determine what happens (shortcut in SSMS: Ctrl + M). You can even run two versions of the same query and the execution planner will easily show you how much resources and time is taken by each, making optimization quite easy.

But to expand on these a bit, say you have an address table like so, and it has over 1 billion records:

CREATE TABLE ADDRESS 
  (ADDRESS_ID INT -- CLUSTERED primary key ADRESS_PK_IDX
  , PERSON_ID INT -- FOREIGN KEY, NONCLUSTERED INDEX ADDRESS_PERSON_IDX
  , CITY VARCHAR(256)
  , MARKED_FOR_CHECKUP BIT
  , **+n^10 different other columns...**)

Now, if you want to find all the address information for person 12345, the index on PERSON_ID is perfect. Since the table has loads of other data on the same row, it would be inefficient and space-consuming to create a nonclustered index to cover all other columns as well as PERSON_ID. In this case, SQL Server will execute an index SEEK on the index in PERSON_ID, then use that to do a Key Lookup on the clustered index in ADDRESS_ID, and from there return all the data in all other columns on that same row.

However, say you want to search for all the persons in a city, but you don't need other address information. This time, the most effective way would be to create an index on CITY and use INCLUDE option to cover PERSON_ID as well. That way, a single index seek / scan would return all the information you need without the need to resort to checking the CLUSTERED index for the PERSON_ID data on the same row.

Now, let's say both of those queries are required but still rather heavy because of the 1 billion records. But there's one special query that needs to be really really fast. That query wants all the persons on addresses that have been MARKED_FOR_CHECKUP, and who must live in New York (ignore whatever checkup means, that doesn't matter). Now you might want to create a third, filtered index on MARKED_FOR_CHECKUP and CITY, with INCLUDE covering PERSON_ID, and with a filter saying CITY = 'New York' and MARKED_FOR_CHECKUP = 1. This index would be insanely fast, as it only ever cover queries that satisfy those exact conditions, and therefore has a fraction of the data to go through compared to the other indexes.

(Disclaimer here, bear in mind that the query execution planner is not stupid, it can use multiple nonclustered indexes together to produce the correct results, so the examples above may not be the best ones available as it's very hard to imagine when you would need 3 different indexes covering the same column, but I'm sure you get the idea.)

The types of index, their columns, included columns, sorting orders, filters etc depend entirely on the situation. You will need to make covering indexes to satisfy several different types of queries, as well as customized indexes created specifically for singular, important queries. Each index takes up space on the HDD so making useless indexes is wasteful and requires extra maintenance whenever the data model changes, and wastes time in defragmentation and statistics update operations though... so you don't want to just slap an index on everything either.

Experiment, learn and work out which works best for your needs.

这篇关于在 SQL Server 中的同一列上创建多个非聚集索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆