为什么hastable的rehash复杂性在最坏的情况下可能是二次的 [英] why hastable's rehash complexity may be quadratic in worst case

查看:172
本文介绍了为什么hastable的rehash复杂性在最坏的情况下可能是二次的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白为什么hastable的rehash复杂性在最坏的情况下可能是二次的:

http://www.cplusplus.com/reference/unordered_set/unordered_multiset/reserve/

任何帮助,将不胜感激!



谢谢 基本知识:


  1. 散列冲突是两个或多个元素采用相同散列时的情况。这可能会导致最坏的情况 O(n)操作。



    我不会再深入了解这一点因为人们可以找到很多解释。基本上所有元素都可以具有相同的散列,因此在该散列中会包含一个包含所有元素的大链接列表(并且在链接列表上搜索当然是 O(n) code $)。



    它不必成为链表,但大多数实现都是这样做的。 / p>


  2. rehash创建一个新的哈希表,并具有所需的大小,并且基本上为旧表中的每个元素插入一个(可能有一个稍微好一点的方法,但我相信大多数实现并没有击败简单插入的渐近最坏情况复杂性)。


    另外,对于上述情况,这一切归结为这种说法:(从此处 1


    具有等价值的元素在同一个存储桶中组合在一起,并以迭代器的方式(参见equal_range )可以遍历所有。

    因此,所有具有等价值的元素都需要组合在一起。要做到这一点,在进行插入时,首先必须检查是否存在具有相同值的其他元素。考虑所有值采用相同散列的情况。在这种情况下,您需要查看上述链接列表中的这些元素。因此,通过 0 ,然后 1 来查看 n 插入,然后是 2 ,然后...,然后是 n-1 元素,它是 0+ 1 + 2 + ... + n-1 = n *(n-1)/ 2 = O你不可以优化它到 O(n)吗? ?对我而言,您可能有意义,但即使如此,这并不意味着所有实现必须这样做。当使用散列表时,通常假定不会有太多的冲突(即使这个假设是天真的),从而避免了最坏的情况下的复杂性,从而减少了额外的复杂性,以避免重复使用 O(n 2



    1:对于所有可能的仇敌,抱歉引用 CPlusPlus 而不是 CPPReference (对于其他人 - CPlusPlus因错误而着名),但我无法在那里找到这些信息(当然,这可能是错误的,但我希望它不是,在这种情况下确实有意义)。


    I do not understand why hastable's rehash complexity may be quadratic in worst case at :

    http://www.cplusplus.com/reference/unordered_set/unordered_multiset/reserve/

    Any help would be appreciated !

    Thanks

    解决方案

    Just some basics:

    1. Hash collisions is when two or more elements take on the same hash. This can cause worst-case O(n) operations.

      I won't really go into this much further, since one can find many explanations of this. Basically all the elements can have the same hash, thus you'll have one big linked-list at that hash containing all your elements (and search on a linked-list is of course O(n)).

      It doesn't have to be a linked-list, but most implementations does it this way.

    2. A rehash creates a new hash table with the required size and basically does an insert for each element in the old table (there may be a slightly better way, but I'm sure most implementations don't beat the asymptotic worst-case complexity of simple inserts).

    In addition to the above, it all comes down to this statement: (from here1)

    Elements with equivalent values are grouped together in the same bucket and in such a way that an iterator (see equal_range) can iterate trough all of them.

    So all elements with equivalent values needs to be grouped together. For this to hold, when doing an insert, you first have to check if there exists other elements with the same value. Consider the case where all the values take on the same hash. In this case, you'll have to look through the above-mentioned linked-list for these elements. So n insertions, looking through 0, then 1, then 2, then ..., then n-1 elements, which is 0+1+2+...+n-1 = n*(n-1)/2 = O(n2).

    Can't you optimize this to O(n)? To me it makes sense that you may be able to, but even if so, this doesn't mean that all implementations have to do it this way. When using hash-tables it's generally assumed that there won't be too many collisions (even if this assumption is naive), thus avoiding the worst-case complexity, thus reducing the need for the additional complexity to have a rehash not take O(n2).


    1: To all the possible haters, sorry for quoting CPlusPlus instead of CPPReference (for everyone else - CPlusPlus is well-known for being wrong), but I couldn't find this information there (so, of course, it could be wrong, but I'm hoping it isn't, and it does make sense in this case).

    这篇关于为什么hastable的rehash复杂性在最坏的情况下可能是二次的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆