SQL Server哈希索引 [英] SQL Server Hash Indexes

查看:273
本文介绍了SQL Server哈希索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用CHECKSUM列类型人为创建一个散列索引时,查找实际上是O(1)还是O(lg n),就像聚簇索引一样?我有一张表格,我将根据它的ID列来选择,并且我需要查找尽可能快,聚集索引也是最快的选择。我正在寻找能够提供O(1)表现的东西。

解决方案

好的,2分。 b SQL CHECKSUM函数不会生成散列值。它实际上是计算一个CRC值。基于散列检查的基础不是一个很好的候选,因为会有相对大量的冲突。如果你想要一个散列函数,你应该检查hash_bytes函数。

其次,你实际上并没有创建一个散列索引。您正在创建一个散列值的正常B树,因此查找时间将与其他任何类似大小的数据类型的B树索引完全相同。

您可能有机会通过使用CRC或散列的长varchar值来获得一点性能,以允许比较较少数量的字节,但字符串比较只检查尽可能多的字节,而不是第一个字符匹配,如果你匹配散列值,那么你需要重新检查实际值。所以除非你有很多非常相似的字符串,否则你最终可能会通过使用散列(或CRC)来比较更多的字节。



总之,我认为这不是一个明智的计划,但是像所有的优化一样,您应该在您的具体情况下对其进行测试,然后再做出决定。如果你愿意发布他们,我会很乐意看到你的结果。我不相信在SQL服务器中找到一行比使用聚集索引有更快的方法。



如果您在意,Ingres(by CA)可以创建散列索引,然后创建O(1)。可能还有其他RDBM也支持真正的散列索引。


When using the CHECKSUM column type to artificially create a hash index, is the lookup actually O(1) or is it still O(lg n) like it is for a clustered index? I have a table from which I will select based on its ID column and I need the lookup to be as fast as possible, so is the clustered index the fastest possible option? I am looking for something that will provide O(1) performance.

解决方案

Okay, 2 points.
The SQL CHECKSUM function does not produce a hash value. It actually calculates a CRC value. It is not a very good candidate to base a hash check on becuase there will be a relativly large number of collisions. You should check the hash_bytes function if you want a hash function.
Secondly, you are not actually creating a hash index. You are creating a normal b-tree on a hash value so the lookup time will be exactly the same as for any other b-tree index on a similar sized data type.
There is a chance that you could gain a little performance by using a CRC or hash of a long varchar value to allow comparisons of a smaller number of bytes, but string comparison only checks as many bytes as it needs to, which is as far as the first character that doesn't match, and if you do match on the hashed value, you then need to double check the actual value anyway. So unless you have a lot of very similar strings you will probably end up comparing MORE bytes by using the hash (or CRC).

In short, I don't think this is a sensible plan, but as with all optimisations you should test it in your specific case and then decide. I would be interested to see your results if you would care to post them. And I don't believe that there is any faster way to locate a row in SQL server than by using a clustered index.

In case you care, Ingres (by CA) can create hash indexes which would then achive O(1). there may be other RDBM's out there that also support true hash indexes.

这篇关于SQL Server哈希索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆