引擎盖下的地图 [英] Go's maps under the hood

查看:95
本文介绍了引擎盖下的地图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读Dave Cheney的运行时程序包后,我发现了底层的地图结构如下:

 // A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}
 

buckets-是存储区数组,其中索引是键的哈希值的低序位,其中存储区为:

 // A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}
 

..好吧,这只是uint8的数组,其中每个项目都是键的哈希值的第一个字节.键值对存储为key/key value/value(每个存储桶八对).但是到底在哪里?考虑到映射可能包含(几乎)任何类型的值.应该在内存中放置某种指针,以存储值数组,但是bmap没有此类信息.

并且由于键的散列位于存储桶中的有序数组中,为什么每次我在map上循环时它的顺序都不同?

解决方案

  • 为什么它们无序?

因为这赋予了运行时更大的自由来实现地图类型.尽管我们知道Go的(当前)实现是一个哈希图,但是语言规范允许使用任何地图实现(例如哈希图,树图等).也不必记住顺序,这使运行时可以更有效地完成工作,并且使用更少的时间记忆.

Adrian的评论很好地总结了很少需要订单,而始终保持订单是浪费.当您确实需要订购时,可以使用提供订购的数据结构.有关示例,请参见映射订单范围循环.

并且由于键的散列位于存储桶中的有序数组中,为什么每次我遍历地图时它的顺序都不同?

Go的作者有意使地图的迭代顺序随机化(因此我们凡人不会依赖固定的顺序).有关更多信息,请参见在Golang中,为什么在地图上进行迭代

另请参阅相关内容:为什么不能去按插入顺序迭代地图?

  • 实际值存储在内存中的什么地方?

位置"由hmap.buckets指定.这是一个指针值,它指向内存中的一个数组,该数组保存存储桶.

buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.

因此,hmap.buckets指向包含存储桶的连续内存段.存储桶由bmap建模",但这不是其实际的内存布局.存储桶的开头是一个数组,其中存储有存储在存储桶中的键的顶部哈希字节(tophash [bucketCnt]uint8),该数组紧随其后是存储桶的bucketCnt键,然后是存储区的bucketCnt值桶.最后是一个溢出指针.

像这样的 conceptual 类型的存储桶,它可以可视化"键和值在内存中的位置:

type conceptualBucket struct {
    tophash     [bucketCnt]uint8
    keys        [bucketCnt]keyType
    values      [bucketCnt]valueType
    overflowPtr uintptr
}

注意:bucketCnt是一个编译时间常数,为8,它是存储桶可以容纳的最大键/元素对数.

当然,这种图片"是不准确的,但是它给出了存储键和值的位置/方式的想法.

After reading Dave Cheney's blogpost about Go's maps there is still few things unclear to me.

TLDR:

  • Why are they unordered?
  • Where actual values are stored in memory?

After digging in runtime package I found out that underlying map structure is following:

// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}

buckets - is array of buckets where indexes is low-order bits of key's hash, where the bucket is:

// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}

..well it's just array of uint8 where every item is first byte of key's hash. And key-value pairs are stores as key/key value/value (eight pairs per bucket). But where exactly? Considering that map may contain value of (almost) any type. There should be some kind of pointer to place in memory where array of values stored, but bmap doesn't have such info.

And since key's hashes are located in ordered array inside bucket, why it's order different every time I looping over map?

解决方案

  • Why are they unordered?

Because this gives greater freedom to the runtime to implement the map type. Although we know Go's (current) implementation is a hashmap, the language specification allows to use any map implementation like hash map, tree map etc. Also not having to remember the order, this allows the runtime to do its job more effectively and using less memory.

Adrian's comment nicely summarizes that order is rarely needed, and it would be a waste to always maintain order. When you do need order, you may use a data structure that provides the ordering. For examples, see Map in order range loop.

And since key's hashes are located in ordered array inside bucket, why it's order different every time I looping over map?

The Go authors intentionally made map's iteration order randomized (so we mortals don't get dependent on a fixed order). For more, see In Golang, why are iterations over maps random?

Also see related: Why can't Go iterate maps in insertion order?

  • Where actual values are stored in memory?

The "where" is specified by hmap.buckets. This is a pointer value, it points to an array in memory, an array holding the buckets.

buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.

So hmap.buckets points to a contiguous memory segment holding buckets. A bucket is "modeled" by bmap, but this is not its actual memory layout. A bucket starts with an array holding top hash bytes of keys being in the bucket (tophash [bucketCnt]uint8), and this array is followed by bucketCnt keys of the bucket, which is then followed by bucketCnt values of the bucket. Lastly there is an overflow pointer.

Think of the bucket like this conceptual type, which "visualizes" where keys and values are located in memory:

type conceptualBucket struct {
    tophash     [bucketCnt]uint8
    keys        [bucketCnt]keyType
    values      [bucketCnt]valueType
    overflowPtr uintptr
}

Note: bucketCnt is a compile time constant being 8, it is the maximum number of key/elem pairs a bucket can hold.

Of course this "picture" is inaccurate, but it gives the idea where / how keys and values are stored.

这篇关于引擎盖下的地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆