布尔列上的索引是否有助于页面缓存 [英] do indexes on boolean columns help page caching

查看:106
本文介绍了布尔列上的索引是否有助于页面缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经读过布尔列如何不像搜索索引那样。
但是我的问题是..如果是聚簇索引,影响记录的物理排列
不能吧用于将一种类型的记录放在一起(在同一页面中),这样那些页面就不太可能被加载到内存中。
我会尝试更好地解释:对于表格

  [BookPages] 
ID(int)
删除(布尔)
文本(Varchar)

如果聚集索引在 ID 列上,则为示例数据将是

  1,true,'快速..'
2,假,'你好... '
3,true,'stack m ..'
4,false,'只是瘦......'

这意味着删除/活动记录为交错记录,因此如果我们搜索记录2

  SELECT [Text] FROM [BookPages] WHERE [Deleted] = false AND [ID] = 2 

叶子数据页面可能以行(1,2)结束,这意味着我们正在加载到内存中,带有已删除字段的记录,我们永远不会感兴趣..
但是如果索引在列中已删除,ID 现在数据

  2,false,'你好w ..'
4,假,'只是瘦......'
1,真实,'快速..'
3,真实,'堆叠米......'

现在,当我们在SQL加载页面时仅定位活动记录时,我们将只有页面活动记录..



因此,在历史悠久且删除记录很多的数据库上,我们可以在我们想要的记录上获得更好的位置,并帮助IO。 。



在数千页上,我们可以确保它们中的大部分永远不会被加载到内存中,并且该数据将始终只保留在磁盘上。 / p>

这个推理是否正确?这可能会影响(改善)大型数据库的整体性能吗?

解决方案

是的,这种推理是正确的。实际上,您可以将数据集划分为两个区域,一个是热区域,另一个是冷区域。使用一点只是这种技术的一个特例。您还可以使用日期列和集群(当然这是否可行取决于架构和数据)。



分区具有类似的效果。选择聚类键的重量较轻,但同样好。



通常,对自动递增的数字进行聚类也具有良好的局部性,因为 IDENTITY 与年龄和年龄相关的值与使用频率相关。



同样的优化不直接适用于非聚簇索引。您也可以为它们使用布尔前缀,但是您需要以可搜索的形式提供它:

  WHERE SomeNCIndexCol =' 1234'AND删除IN(0,1)

SQL Server不够聪明,无法解决这个问题本身。它不能像Oracle那样跳过第一个索引级别。所以我们必须手动提供搜索键。 (连接项目: https://connect.microsoft.com/SQLServer/feedback/details/695044



另一个问题是写性能。将行标记为已删除( SET Deleted = 1 )现在需要CI的物理删除+插入对以及每个NCI的一对。大多数ORM都不支持主键更改,因此您可能不应将此群集键设置为主键。



作为附注,在位列上创建索引其他用例也是如此。如果99%的值为零或一,您绝对可以使用索引执行搜索和键查找。您也可以使用这样的索引进行计数(或对位列进行分组)。


I have read about how boolean columns don't serve much as searching indexes.. But my question is.. if a clustered index, affects the physical arrangement of the records can't it be used to put a type of records, all together (in the same page) so that those page will have less chance of being loaded into memory.. I will try to explain better: for the table

[BookPages]
ID(int)
Deleted(Boolean)
Text(Varchar)

if the clustered index is on ID column, a sample data would be

1, true,  'the quick..'
2, false, 'hello w..'
3, true,  'stack m..'
4, false, 'just thin...'

this means that the delete/active records as interleaved, so if we search for the record 2

SELECT [Text] FROM [BookPages] WHERE [Deleted] = false AND [ID] = 2

the "leaf" data page may end up with the rows (1,2) this mean that we are loading into memory, records with the deleted field, that we will never be interested in.. but if the index was in the columns Deleted,ID the data would now be

2, false, 'hello w..'
4, false, 'just thin...'
1, true,  'the quick..'
3, true,  'stack m..'

now, when we target only the active records as SQL loads the pages, we will have pages full with of only active records..

So on a database with a long history and a lot deleted records, we can have better locality on the records that we want, and help with IO..

And on thousands of pages we can make sure that a large chunk of them will never be loaded on to memory, and that data will always only remain on disk.

is this reasoning correct? may this impact(improve) overall performance on large databases?

解决方案

Yes, that reasoning is correct. You can in effect partition the data set into two regions, one hot and one cold. Using a bit is just a special case of this technique. You also could use a date column and cluster on that (of course whether that is feasible or not depends on the schema and data).

Partitioning has a similar effect. Choosing the clustering key is lighter weight and just as good though.

Oftentimes clustering on an auto-incremented number also has good locality because the IDENTITY value correlates with age and age correlates with frequency of usage.

The same optimization does not apply directly to nonclustered indexes. You can use a boolean prefix for them, too, but you need to provide it in a sargable form:

WHERE SomeNCIndexCol = '1234' AND Deleted IN (0, 1)

SQL Server is not smart enough to figure this out by itself. It cannot "skip" the first index level like Oracle can. So we have to provide seek keys manually. (Connect item: https://connect.microsoft.com/SQLServer/feedback/details/695044)

A different concern is write performance. Marking a row as deleted (SET Deleted = 1) now requires a physical delete+insert pair for the CI plus one for each NCI. Primary key changes are not supported by most ORMs so you probably should not set this clustering key as the primary key.

As a side note creating an index on a bit column has other use cases as well. If 99% of the values are zero or one you can definitely use the index to perform a seek and key lookup. You can also use such an index for counting (or grouping on the bit column).

这篇关于布尔列上的索引是否有助于页面缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆