如何实现std :: unordered_map [英] How std::unordered_map is implemented

查看:191
本文介绍了如何实现std :: unordered_map的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

c ++ unordered_map冲突处理,调整大小和重新散列



这是我打开的上一个问题,我看到我对unordered_map的实现有很多困惑。我相信很多其他人都与我混淆。基于我没有阅读标准的知识:


每个unordered_map实现存储一个链接到外部
节点的链表数组桶...不,这是不是最多的
有效的方式来实现哈希映射为最常见的用途。
不幸的是,在规范的
unordered_map一个小的监督,但需要这种行为。所需的行为是
,当插入或删除
时,迭代器必须保持有效
其他元素


希望有人可能解释实现和它如何符合c ++标准定义(在性能要求方面),如果它不是最有效的方式实现散列图数据结构如何改进?

解决方案

标准有效地要求 std :: unordered_set std :: unordered_map 使用打开散列的实现,这意味着一个存储桶数组,每个数组都包含逻辑(通常是实际的)列表的头。这个要求是微妙的:这是默认最大负载因子为1.0的结果,并且保证该表不会被重新散列,除非超越该负载因子:这将是不切实际的没有链接,因为与闭合哈希的冲突变得压倒了,因为负载因子接近1:


23.2.5 / 15: insert 如果(N + n)成员不影响迭代器的有效性, z * B ,其中 N 是插入操作之前容器中的元素数量, n 是插入元素的数量, B 是容器的桶数, z 是容器的最大值负载因子。



位于23.5.4.2/1构造函数的效果中: max_load_factor c $ c>返回 1.0


通过任何空桶,GCC的实现使用迭代器将桶装入一个包含所有值的单链表中:迭代器指向紧邻桶的元素之前的元素,因此如果擦除桶的最后一个值,那么下一个指针可以被重新连接。)



关于您引用的文字:


否,以最有效的方式来实现最常见用途的哈希映射。不幸的是,在unordered_map的规范中有一个小的监督,但是需要这个行为。所需的行为是插入或删除其他元素时,迭代器必须保持有效


没有监督完成是非常仔细和完全与充分意识。这是真的,其他妥协可能已经打击,但开放哈希/链接方法是一个合理的折衷,一般使用,合理优雅地处理与平凡的哈希函数的冲突,不是太浪费与小或大的键/值类型,并处理任意多个 insert / 擦除对,而不会逐渐降低性能许多封闭哈希实现的方式。



作为意识的证据,来自 Matthew Austern的建议


我不知道在通用框架中任何令人满意的开放式寻址。开放寻址提出了一些问题:



•必须区分空缺职位和被占用职位。



•有必要使用默认构造函数将哈希表限制为类型,并提前构造每个数组元素,否则维护一个数组,其中的一些元素是对象,而其他元素是原始内存。 p>

•开放式寻址使得碰撞管理变得困难:如果你插入一个散列码映射到一个已经占用的位置的元素,你需要一个策略告诉你在哪里下一步。这是一个已解决的问题,但最着名的解决方案很复杂。



•当允许擦除元素时,冲突管理特别复杂。 (见Knuth的讨论。)标准库的容器类应该允许擦除。



•开放寻址的冲突管理方案倾向于假设一个固定大小的数组可以容纳N个元素。标准库的容器类应该能够在插入新元素时根据需要增长,直到可用内存的限制。



解决这些问题可能是一个有趣的研究项目,但是在没有C ++语言的实现经验的情况下,标准化开放寻址容器类是不合适的。


特别对于仅插入表,数据小到足以直接存储在桶中,对未使用的桶有一个方便的标记值,好的哈希函数,一个封闭的哈希方法可能大约快一个数量级,使用显着减少的内存,但这不是一般目的。



哈希的完全比较和阐述表设计选项及其影响是SO的主题因为它太宽泛,无法在这里妥善处理。


c++ unordered_map collision handling , resize and rehash

This is a previous question opened by me and I have seen that I am having a lot of confusion about how unordered_map is implemented. I am sure many other people shares that confusion with me. Based on the information I have know without reading the standard:

Every unordered_map implementation stores a linked list to external nodes in the array of buckets... No, that is not at all the most efficient way to implement a hash map for most common uses. Unfortunately, a small "oversight" in the specification of unordered_map all but requires this behavior. The required behavior is that iterators to elements must stay valid when inserting or deleting other elements

I was hoping that someone might explain the implementation and how it corresponds to the c++ standard definition (in terms of performance requirements ) and if it is really not the most efficient way to implement an hash map data structure how it can be improved ?

解决方案

The Standard effectively mandates std::unordered_set and std::unordered_map implementations that use open hashing, which means an array of buckets, each of which holds the head of a logical (and typically actual) list. That requirement is subtle: it's a consequence of the default max load factor being 1.0 and the guarantee that the table will not be rehashed unless grown beyond that load factor: that would be impractical without chaining, as the collisions with closed hashing become overwhelming as the load factor approaches 1:

23.2.5/15: The insert and emplace members shall not affect the validity of iterators if (N+n) < z * B, where N is the number of elements in the container prior to the insert operation, n is the number of elements inserted, B is the container’s bucket count, and z is the container’s maximum load factor.

amongst the Effects of the constructor at 23.5.4.2/1: max_load_factor() returns 1.0.

(To allow optimal iteration without passing over any empty buckets, GCC's implementation fills the buckets with iterators into a single singly-linked list holding all the values: the iterators point to the element immediately before that bucket's elements, so the next pointer there can be rewired if erasing the bucket's last value.)

Regarding the text you quote:

No, that is not at all the most efficient way to implement a hash map for most common uses. Unfortunately, a small "oversight" in the specification of unordered_map all but requires this behavior. The required behavior is that iterators to elements must stay valid when inserting or deleting other elements

There is no "oversight"... what's done was very deliberate and done with full awareness. It's true that other compromises could have been struck, but the open hashing / chaining approach is a reasonable compromise for general use, that copes reasonably elegantly with collisions from mediocre hash functions, isn't too wasteful with small or large key/value types, and handles arbitrarily-many insert/erase pairs without gradually degrading performance the way many closed hashing implementations do.

As evidence of the awareness, from Matthew Austern's proposal here:

I'm not aware of any satisfactory implementation of open addressing in a generic framework. Open addressing presents a number of problems:

• It's necessary to distinguish between a vacant position and an occupied one.

• It's necessary either to restrict the hash table to types with a default constructor, and to construct every array element ahead of time, or else to maintain an array some of whose elements are objects and others of which are raw memory.

• Open addressing makes collision management difficult: if you're inserting an element whose hash code maps to an already-occupied location, you need a policy that tells you where to try next. This is a solved problem, but the best known solutions are complicated.

• Collision management is especially complicated when erasing elements is allowed. (See Knuth for a discussion.) A container class for the standard library ought to allow erasure.

• Collision management schemes for open addressing tend to assume a fixed size array that can hold up to N elements. A container class for the standard library ought to be able to grow as necessary when new elements are inserted, up to the limit of available memory.

Solving these problems could be an interesting research project, but, in the absence of implementation experience in the context of C++, it would be inappropriate to standardize an open-addressing container class.

Specifically for insert-only tables with data small enough to store directly in the buckets, a convenient sentinel value for unused buckets, and a good hash function, a closed hashing approach may be roughly an order of magnitude faster and use dramatically less memory, but that's not general purpose.

A full comparison and elaboration of hash table design options and their implications is off topic for S.O. as it's way too broad to address properly here.

这篇关于如何实现std :: unordered_map的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆