不确定 unordered_map 是如何工作的 [英] Not sure how unordered_map works
问题描述
我有点困惑 unordered_map 是如何工作的,桶是什么以及它们是如何管理的.
I've a little bit confusion how unordered_map works and what buckets are and how thet are managed.
来自这篇博文,unordered_map 是向量的向量.
From this blog post, unordered_map is vector of vectors.
我的问题是:
- 假设桶是内部"向量是否正确?
- 由于每个桶(向量)可以包含多个元素,由哈希表(外部"向量)上的哈希冲突给出,并且由于我们必须扫描这个内部向量(在线性时间内),是否正确假设我们必须在键类型上定义 equal 方法(依赖于哈希运算符)才能找到桶内的键?
- 默认情况下外部向量(哈希表)的大小是多少?
- 默认情况下内部矢量大小是多少?
- 如果一个桶中的元素数量变得太多会发生什么?换句话说,当 rehash 发生时?
对于这些问题,我很抱歉,但我没有找到任何关于此结构如何工作的详细说明(例如在 cppreference.com 上).
Sorry for these questions, but I didn't find any detailed explanation how this structure works (on cppreference.com for example).
推荐答案
std::unordered_map 是标准的 C++ 哈希表.它曾经在 STL 中被称为 hash_map,但是当许多 STL 的接口在 1998 年被合并到 C++ 中,到 2011 年,很多库都有自己的 hash_map,C++ 不得不选择另一个名称(我认为无序"是一个不错的选择;假设哈希表中的顺序是错误的常见来源).
std::unordered_map is the standard C++ hash table. It used to be called hash_map in STL, but missed the boat when many of STL's interfaces were merged into C++ in 1998, and by 2011, so many libraries had their own hash_map, C++ had to pick another name (I think "unordered" was a great choice; assuming order in a hash table is a common source of bugs).
假设桶是内部"向量是否正确?
is it correct to assume that the buckets are the "internal" vectors?
不,它既不正确(与迭代器失效要求不兼容)又危险(在这种假设下,您最终可能会减去指向同一存储桶中元素的指针).
no, it is both incorrect (incompatible with iterator invalidation requirements) and dangerous (under that assumption you may end up subtracting pointers to elements in the same bucket).
在现实生活中,桶是链表;例如
In real life, the buckets are linked lists; e.g.
- LLVM libc++
unordered_map
是一个 unique_ptr 到 __hash_node 的链表数组a>s - GNU libstdc++
unordered_map
是一个 指向_Hash_nodes
- LLVM libc++
unordered_map
is a unique_ptr to an array of linked lists of __hash_nodes - GNU libstdc++
unordered_map
is a pointer to an array of linked lists of _Hash_nodes
假设我们必须在键类型上定义 equal 方法(依赖于散列运算符)以便在存储桶中找到键是否正确?
is it correct to assume that we have to define the equal method on the key type (in addiction to the hash operator) in order to find the key inside the bucket?
是的,在存储桶中定位键正是 std::unordered_map
的第四个模板参数的用途(不需要从字面上调用键类型上的相等方法",当然)
Yes, locating the key in a bucket is exactly what the 4th template parameter of std::unordered_map
is for (does not need to call the "equal method on the key type" literally, of course)
默认的外部向量(哈希表)大小是多少?
what is the external vector (hash table) size by default?
没有外部向量".默认构造的 std::unordered_map
的桶数是 实现定义,您可以使用 bucket_count.
There is no "external vector". The number of buckets for a default-constructed std::unordered_map
is implementation-defined, you can query it with bucket_count.
默认的内部向量大小是多少?
what is the internal vector size by default?
没有内部向量".任何给定桶的大小等于当前放置在桶中的元素数.您可以使用 bucket_size
There is no "internal vector". The size of any given bucket equals the number of elements currently placed in the bucket. You can query it with bucket_size
如果一个桶中的元素数量变得太多会发生什么?换句话说,当 rehash 发生时?
what happens if the number of elements in one bucket becomes too big?bor in other words, when the rehash happens?
如果 one 存储桶中的元素数量太大,则不会发生任何事情.但是如果每个桶的平均元素数(又名 load_factor)超过
Nothing happens if the number of elements in one bucket becomes too big. But if average number of elements per bucket (aka load_factor) exceeds max_load_factor, rehash happens (e.g. on insert)
这篇关于不确定 unordered_map 是如何工作的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!