我应该索引SQL Server中的位字段吗? [英] Should I index a bit field in SQL Server?

查看:105
本文介绍了我应该索引SQL Server中的位字段吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我记得在某一点上读到索引具有低基数(少量不同值)的字段并不值得做。我承认我对索引如何理解为什么不够了解。

I remember reading at one point that indexing a field with low cardinality (a low number of distinct values) is not really worth doing. I admit I don't know enough about how indexes work to understand why that is.

那么如果我有一个包含1亿行的表,那我该怎么办?选择位字段为1的记录?让我们说在任何时间点,只有少数记录位字段为1(而不是0)。是否值得索引该位字段?为什么?

So what if I have a table with 100 million rows in it, and I am selecting records where a bit field is 1? And let's say that at any point in time, there are only a handful of records where the bit field is 1 (as opposed to 0). Is it worth indexing that bit field or not? Why?

当然我可以测试它并检查执行计划,我会这样做,但我也很好奇它背后的理论。什么时候基数很重要?什么时候不重要?

Of course I can just test it and check the execution plan, and I will do that, but I'm also curious about the theory behind it. When does cardinality matter and when does it not?

推荐答案

考虑SQL中的索引是什么 - 索引实际上是一大块内存指向其他内存块(即指向行的指针)。索引被分解为页面,以便可以根据使用情况从内存中加载和卸载索引的部分。

Consider what an index is in SQL - and index is really a chunk of memory pointing at other chunks of memory (i.e. pointers to rows). The index is broken into pages so that portions of the index can be loaded and unloaded from memory depending on usage.

当您要求一组行时,SQL使用用于查找行的索引比表扫描更快(查看每一行)。

When you ask for a set of rows, SQL uses the index to find the rows more quickly than table scanning (looking at every row).

SQL具有聚簇索引和非聚簇索引。我对聚簇索引的理解是它们将类似的索引值分组到同一页面中。这样,当您要求所有与索引值匹配的行时,SQL可以从内存的聚簇页面返回这些行。这就是为什么尝试集群索引GUID列是一个坏主意 - 你不要尝试集群随机值。

SQL has clustered and non-clustered indexes. My understanding of clustered indexes is that they group similar index values into the same page. This way when you ask for all the rows matching an index value, SQL can return those rows from a clustered page of memory. This is why trying to cluster index a GUID column is a bad idea - you don't try to cluster random values.

索引整数列时,SQL的索引包含每个索引值的一组行。如果你的范围是1到10,那么你将有10个索引指针。根据有多少行,可以不同地分页。如果您的查询查找匹配1的索引,然后查找Name包含Fred的位置(假设Name列未编入索引),则SQL会非常快速地获取与1匹配的行集,然后进行表扫描以查找其余行。

When you index an integer column, SQL's index contains a set of rows for each index value. If you have a range of 1 to 10, then you would have 10 index pointers. Depending on how many rows there are this can be paged differently. If your query looks for the index matching "1" and then where Name contains "Fred" (assuming the Name column is not indexed), SQL gets the set of rows matching "1" very quickly, then table scans to find the rest.

那么SQL真正在做的是尝试减少它必须迭代的工作集(行数)。

So what SQL is really doing is trying to reduce the working set (number of rows) it has to iterate over.

索引位字段(或某个窄范围)时,只会将工作集减少与该值匹配的行数。如果你有少量的行匹配它会减少你的工作集很多。对于50/50分布的大量行,与保持索引最新相比,它可能会为您带来非常小的性能提升。

When you index a bit field (or some narrow range), you only reduce the working set by the number of rows matching that value. If you have a small number of rows matching it would reduce your working set a lot. For a large number of rows with 50/50 distribution, it might buy you very little performance gain vs. keeping the index up to date.

每个人都说测试的原因是因为SQL包含一个非常聪明和复杂的优化器,它可能会忽略索引,如果它决定表扫描更快,或者可能使用排序,或者可能组织内存页,但它很喜欢。

The reason everyone says to test is because SQL contains a very clever and complex optimizer that may ignore an index if it decides table scanning is faster, or may use a sort, or may organize memory pages however it darn well likes.

这篇关于我应该索引SQL Server中的位字段吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆