在.NET词典中,双哈希如何工作? [英] How double hashing works in case of the .NET Dictionary?

查看:42
本文介绍了在.NET词典中,双哈希如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

前一天,我正在阅读那篇关于CodeProject的文章

我很难理解有关.NET词典的实现的几点(考虑实现

And I got hard times understanding a few points about the implementation of the .NET Dictionary (considering the implementation here without all the optimizations in .NET Core):

注意:如果添加的项目超过表中的最大数量 (即7199369),resize方法将手动搜索下一个质数 大于旧大小两倍的数字.

Note: If will add more items than the maximum number in the table (i.e 7199369), the resize method will manually search the next prime number that is larger than twice the old size.

  • 注意:调整大小时将尺寸加倍的原因 数组是使内部哈希表操作具有渐近性 复杂.质数用于支持 二次哈希.

    Note: The reason that the sizes are being doubled while resizing the array is to make the inner-hash table operations to have asymptotic complexity. The prime numbers are being used to support double-hashing.

  • 所以我试图回想起十年前我和我的好朋友Wikipedia一起学习CS的旧课程:

    So I tried to remember my old CS classes back a decade ago with my good friend wikipedia:

    • Open Addressing
    • Separate Chaining
    • Double Hashing

    但是,除了

    But I still don't really see how first it relates to double hashing (which is a collision resolution technique for open-addressed hash tables) except the fact that the Resize() method double of the entries based on the minimum prime number (taken based on the current/old size), and tbh I don't really see the benefits of "doubling" the size, "asymptotic complexity" (I guess that article meant O(n) when the underlying array (entries) is full and subject to resize).

    首先,如果在使用或不使用质数的情况下将大小加倍,是不是真的不一样?

    First, If you double the size with or without using a prime, is it not really the same?

    第二,对我来说,.NET哈希表在解决冲突时使用一种单独的链接技术.

    Second, to me, the .NET hash table use a separate chaining technique when it comes to collision resolution.

    我想我一定错过了一些事情,我想找一个可以阐明这两点的人.

    I guess I must have missed a few things and I would like to have someone who can shed the light on those two points.

    推荐答案

    我在 Reddit ,所以我要在这里总结一下:

    I got my answer on Reddit, so I am gonna try to summarize here:

    冲突解决技术

    首先,冲突解决方案似乎正在使用单独的链接技术和不是打开寻址技术,因此没有

    First off, it seems that the collision resolution is using Separate Chaining technique and not Open addressing technique and therefore there is no Double Hashing strategy:

    代码如下:

    private struct Entry 
    {
        public int hashCode;    // Lower 31 bits of hash code, -1 if unused
        public int next;        // Index of next entry, -1 if last
        public TKey key;        // Key of entry
        public TValue value;    // Value of entry
    }
    

    只是将所有内容存储在相同的条目数组中,而不是为共享相同的哈希码/索引的所有条目(如列表或每个存储桶)分配一个专用的存储空间.

    It just that instead of having one dedicated storage for all the entries sharing the same hashcode / index like a list or whatnot for every bucket, everything is stored in the same entries array.

    素数

    关于素数,答案就在这里: https://cs.stackexchange.com/a/64191/42745 全部涉及多个:

    About the prime number the answer lies here: https://cs.stackexchange.com/a/64191/42745 it's all about multiple:

    因此,为了最大程度地减少冲突,重要的是减少m和K元素之间的公因数.这怎么做? 取得成就?通过选择m为具有很少因素的数字:a 素数.

    Therefore, to minimize collisions, it is important to reduce the number of common factors between m and the elements of K. How can this be achieved? By choosing m to be a number that has very few factors: a prime number.

    使基本条目数组大小加倍

    通过将阵列的大小增加足够数量的插槽来帮助避免调用过多的调整大小操作(即副本).

    Help to avoid call too many resize operations (i.e. copies) by increasing the size of the array by enough amount of slots.

    看到那个答案: https://stackoverflow.com/a/2369504/4636721

    如果,哈希表无法要求摊销固定时间插入", 例如,调整大小是一个恒定的增量.在这种情况下 调整大小的成本(随散列表的大小而增加) 会使一次插入的费用在 要插入的元素.因为调整大小变得越来越昂贵 随着表的大小,它必须越来越少地发生" 保持摊销成本不变.

    Hash-tables could not claim "amortized constant time insertion" if, for instance, the resizing was by a constant increment. In that case the cost of resizing (which grows with the size of the hash-table) would make the cost of one insertion linear in the total number of elements to insert. Because resizing becomes more and more expensive with the size of the table, it has to happen "less and less often" to keep the amortized cost of insertion constant.

    这篇关于在.NET词典中,双哈希如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆