C ++ STL unordered_map如何解决冲突? [英] How does C++ STL unordered_map resolve collisions?

查看:2249
本文介绍了C ++ STL unordered_map如何解决冲突?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C ++ STL unordered_map如何解决冲突?

How does C++ STL unordered_map resolve collisions?

查看 http://www.cplusplus.com/reference/unordered_map/unordered_map/ ,它说唯一键
容器中没有两个元素可以有等效的键。

Looking at the http://www.cplusplus.com/reference/unordered_map/unordered_map/, it says "Unique keys No two elements in the container can have equivalent keys."

这意味着容器确实正在解决冲突。但是,该页不告诉我它是怎么做的。我知道一些方法来解决冲突,如使用链接列表和/或探测。我想知道的是c ++ STL unordered_map如何解析它。

That should mean that the container is indeed resolving collisions. However, that page does not tell me how it is doing it. I know some ways to resolve collisions like using linked lists and/or probing. What I want to know is how the c++ STL unordered_map is resolving it.

推荐答案

人们似乎意识到。

具体来说,标准要求(§23.2.5/ 9):

Specifically, the standard requires (§23.2.5/9):


无序关联容器的元素被组织成桶。具有相同散列码的键出现在同一个存储桶中。

The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket.

接口包含 bucket_count 在常量时间运行。 (表103)。它还包括一个<_ c $ c> bucket_size ,它必须在时间上按照桶的大小线性运行。

The interface includes a bucket_count that runs in constant time. (table 103). It also includes a bucket_size that has to run in time linear on the size of the bucket.

一个使用碰撞链接的实现。当你使用碰撞链接,满足所有的要求是在容易和微不足道之间。 bucket_count()是数组中元素的数量。 bucket_size()是碰撞链中的元素数。

That's basically describing an implementation that uses collision chaining. When you do use collision chaining, meeting all the requirements is somewhere between easy and trivial. bucket_count() is the number of elements in your array. bucket_size() is the number of elements in the collision chain. Getting them in constant and linear time respectively is simple and straightforward.

相反,如果你使用类似线性探测或双重哈希的方法,这些需求变得几乎不可能满足。具体来说,散列为特定值的所有项目都需要放在同一个桶中,您需要能够在固定时间内计算这些值。

By contrast, if you use something like linear probing or double hashing, those requirements become all but impossible to meet. Specifically, all the items that hashed to a specific value need to land in the same bucket, and you need to be able to count those buckets in constant time.

但是,如果你使用线性探测或双重哈希,找到所有散列为相同值的项目意味着你需要哈希的值,然后通过链的非空项目,在表中找到有多少散列到相同的值。这对于散列为相同值的项目数不是线性的 - 对于散列为相同的冲突值的项目数是线性的。

But, if you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though--it's linear on the number of items that hashed to the same or a colliding value.

仍然可能满足要求,如果你想足够严重:哈希表中的每个插槽不仅包括一个值,而且还包括一个计数。插入项目时,您计算散列,然后增加该项目编号的计数,即使该插槽已被占用。然后,您通过被链接的占用项目找到一个插槽,您可以插入此项目。

It's still possible to meet the requirement if you want to badly enough: have each slot in the hash table include not only a value, but also a count. When you insert an item, you compute the hash, then increment the count for that item number -- even if that slot is already occupied. Then you walk through the "chain" of occupied items to find a slot where you can insert this item. When you need to find a bucket count, you hash a key to get a slot, then return the count for that slot.

假设您在每个插槽中使用该计数,则可以使用该插槽计数。跟踪bucket_size(),你可以很容易地保持一个bucket_count:保持一个表的桶计数器,并且每当你将槽的计数器从0增加到1,你增加表的桶计数器。

Assuming you use that count in each slot to track the bucket_size(), you can keep a bucket_count fairly easily as well: maintain a bucket counter for the table, and every time you increment a slot's counter from 0 to 1, you increment the table's bucket counter.

现在的问题:使线性探测或双重哈希实用的主要原因是,避免项目之间的指针(根据碰撞链接的需要)允许您创建一个在给定的内存量,从而保持负载因子更低。然而,如果你为每个插槽添加一个计数器,你只是抛弃了存储密度的优势,给定数量的插槽将需要大约相同的存储,如果你使用碰撞链。

Now the problem: the main thing that makes linear probing or double hashing practical is that avoiding pointers between items (as needed for collision chaining) allows you to create a table of substantially more slots in a given amount of memory, thus keeping the load factor lower. If, however, you add a counter to each slot, you've just thrown away that advantage in storage density, a given number of slots will require about the same storage as if you used collision chaining instead.

简而言之,虽然这种设计无疑是可能的,但它几乎是所有可能世界中最糟糕的。即使是(很可能)它的发明者,我不能感到特别自豪 - 我确信我永远不会实现它,也不能想象任何人也想这样做。从实践的角度来看,它使我成为一个无用的垃圾。更糟糕的是,仍然有更多的要求,仍然更难以满足 - 在我们遇见他们的时候,我们的设计(这是已经不寻常的线性探测或双重哈希)将几乎是不可识别的(除了,也许,一个真正奇怪的碰撞链实现)。

In short, while this design is undoubtedly possible, it's pretty much the worst of all possible worlds. Even as (quite possibly) its inventor, I can't feel particularly proud--I'm pretty sure I'll never implement it, and can't quite imagine anybody else wanting to do so either. From a practical viewpoint, it strikes me as pretty much a useless piece of garbage. Worse still, there are still more requirements that would be still more difficult to meet--and by the time we met them, our design (which is already unusual for linear probing or double hashing) would be nearly unrecognizable (except, perhaps, as a really strange implementation of collision chaining).

总结:我想我们可以得出结论,所有实际的实现 std :: unordered_set (或 unordered_map )几乎肯定使用碰撞链。虽然使用线性探测或双重哈希可能(只是几乎)可能满足要求,但是这样的实现似乎失去了很多,并且几乎没有回报。

Summary: I think we can conclude that all practical implementations of std::unordered_set (or unordered_map) almost certainly use collision chaining. While it might be (just barely) possible to meet the requirements using linear probing or double hashing, such an implementation seems to lose a great deal and gain nearly nothing in return.

这篇关于C ++ STL unordered_map如何解决冲突?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆