引擎盖下的地图 [英] Go's maps under the hood
问题描述
阅读Dave Cheney的运行时程序包后,我发现了底层的地图结构如下:
// A header for a Go map.
type hmap struct {
// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
extra *mapextra // optional fields
}
buckets
-是存储区数组,其中索引是键的哈希值的低序位,其中存储区为:
// A bucket for a Go map.
type bmap struct {
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt elems.
// NOTE: packing all the keys together and then all the elems together makes the
// code a bit more complicated than alternating key/elem/key/elem/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}
..好吧,这只是uint8
的数组,其中每个项目都是键的哈希值的第一个字节.键值对存储为key/key value/value
(每个存储桶八对).但是到底在哪里?考虑到映射可能包含(几乎)任何类型的值.应该在内存中放置某种指针,以存储值数组,但是bmap
没有此类信息.
并且由于键的散列位于存储桶中的有序数组中,为什么每次我在map上循环时它的顺序都不同?
- 为什么它们无序?
因为这赋予了运行时更大的自由来实现地图类型.尽管我们知道Go的(当前)实现是一个哈希图,但是语言规范允许使用任何地图实现(例如哈希图,树图等).也不必记住顺序,这使运行时可以更有效地完成工作,并且使用更少的时间记忆.
Adrian的评论很好地总结了很少需要订单,而始终保持订单是浪费.当您确实需要订购时,可以使用提供订购的数据结构.有关示例,请参见映射订单范围循环.
并且由于键的散列位于存储桶中的有序数组中,为什么每次我遍历地图时它的顺序都不同?
Go的作者有意使地图的迭代顺序随机化(因此我们凡人不会依赖固定的顺序).有关更多信息,请参见在Golang中,为什么在地图上进行迭代
另请参阅相关内容:为什么不能去按插入顺序迭代地图?
- 实际值存储在内存中的什么地方?
位置"由hmap.buckets
指定.这是一个指针值,它指向内存中的一个数组,该数组保存存储桶.
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
因此,hmap.buckets
指向包含存储桶的连续内存段.存储桶由bmap
建模",但这不是其实际的内存布局.存储桶的开头是一个数组,其中存储有存储在存储桶中的键的顶部哈希字节(tophash [bucketCnt]uint8
),该数组紧随其后是存储桶的bucketCnt
键,然后是存储区的bucketCnt
值桶.最后是一个溢出指针.
像这样的 conceptual 类型的存储桶,它可以可视化"键和值在内存中的位置:
type conceptualBucket struct {
tophash [bucketCnt]uint8
keys [bucketCnt]keyType
values [bucketCnt]valueType
overflowPtr uintptr
}
注意:bucketCnt
是一个编译时间常数,为8
,它是存储桶可以容纳的最大键/元素对数.
当然,这种图片"是不准确的,但是它给出了存储键和值的位置/方式的想法.
After reading Dave Cheney's blogpost about Go's maps there is still few things unclear to me.
TLDR:
- Why are they unordered?
- Where actual values are stored in memory?
After digging in runtime package I found out that underlying map structure is following:
// A header for a Go map.
type hmap struct {
// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
extra *mapextra // optional fields
}
buckets
- is array of buckets where indexes is low-order bits of key's hash, where the bucket is:
// A bucket for a Go map.
type bmap struct {
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt elems.
// NOTE: packing all the keys together and then all the elems together makes the
// code a bit more complicated than alternating key/elem/key/elem/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}
..well it's just array of uint8
where every item is first byte of key's hash. And key-value pairs are stores as key/key value/value
(eight pairs per bucket). But where exactly? Considering that map may contain value of (almost) any type. There should be some kind of pointer to place in memory where array of values stored, but bmap
doesn't have such info.
And since key's hashes are located in ordered array inside bucket, why it's order different every time I looping over map?
- Why are they unordered?
Because this gives greater freedom to the runtime to implement the map type. Although we know Go's (current) implementation is a hashmap, the language specification allows to use any map implementation like hash map, tree map etc. Also not having to remember the order, this allows the runtime to do its job more effectively and using less memory.
Adrian's comment nicely summarizes that order is rarely needed, and it would be a waste to always maintain order. When you do need order, you may use a data structure that provides the ordering. For examples, see Map in order range loop.
And since key's hashes are located in ordered array inside bucket, why it's order different every time I looping over map?
The Go authors intentionally made map's iteration order randomized (so we mortals don't get dependent on a fixed order). For more, see In Golang, why are iterations over maps random?
Also see related: Why can't Go iterate maps in insertion order?
- Where actual values are stored in memory?
The "where" is specified by hmap.buckets
. This is a pointer value, it points to an array in memory, an array holding the buckets.
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
So hmap.buckets
points to a contiguous memory segment holding buckets. A bucket is "modeled" by bmap
, but this is not its actual memory layout. A bucket starts with an array holding top hash bytes of keys being in the bucket (tophash [bucketCnt]uint8
), and this array is followed by bucketCnt
keys of the bucket, which is then followed by bucketCnt
values of the bucket. Lastly there is an overflow pointer.
Think of the bucket like this conceptual type, which "visualizes" where keys and values are located in memory:
type conceptualBucket struct {
tophash [bucketCnt]uint8
keys [bucketCnt]keyType
values [bucketCnt]valueType
overflowPtr uintptr
}
Note: bucketCnt
is a compile time constant being 8
, it is the maximum number of key/elem pairs a bucket can hold.
Of course this "picture" is inaccurate, but it gives the idea where / how keys and values are stored.
这篇关于引擎盖下的地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!