垃圾回收和Go中指针的正确用法 [英] Garbage collection and correct usage of pointers in Go

查看:112
本文介绍了垃圾回收和Go中指针的正确用法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我来自Python/Ruby/JavaScript背景.我了解指针的工作原理,但是,我不确定在以下情况下如何利用它们.

假设我们有一个虚构的Web API,该API搜索一些图像数据库并返回一个JSON,该JSON描述在找到的每个图像中显示的内容:

[
    {
        "url": "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
        "description": "Ocean islands",
        "tags": [
            {"name":"ocean", "rank":1},
            {"name":"water", "rank":2},
            {"name":"blue", "rank":3},
            {"name":"forest", "rank":4}
        ]
    },

    ...

    {
        "url": "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg",
        "description": "Bridge over river",
        "tags": [
            {"name":"bridge", "rank":1},
            {"name":"river", "rank":2},
            {"name":"water", "rank":3},
            {"name":"forest", "rank":4}
        ]
    }
]

我的目标是在Go中创建一个数据结构,该数据结构会将每个标签映射到如下所示的图像URL列表:

{
    "ocean": [
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
    ],
    "water": [
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ],
    "blue": [
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
    ],
    "forest":[
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg", 
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ],
    "bridge": [
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ],
    "river":[
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ]
}

如您所见,每个图像URL可以同时属于多个标签.如果我有成千上万个图像和更多标签,那么如果按每个标签的值复制图像URL字符串,则此数据结构会变得很大.这是我要利用指针的地方.

我可以用Go中的两个结构表示JSON API响应,func searchImages()模仿假API:

package main

import "fmt"


type Image struct {
    URL string
    Description string
    Tags []*Tag
}

type Tag struct {
    Name string
    Rank int
}

// this function mimics json.NewDecoder(resp.Body).Decode(&parsedJSON)
func searchImages() []*Image {
    parsedJSON := []*Image{
        &Image {
            URL: "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
            Description: "Ocean islands",
            Tags: []*Tag{
                &Tag{"ocean", 1},
                &Tag{"water", 2},
                &Tag{"blue", 3},
                &Tag{"forest", 4},
            }, 
        },
        &Image {
            URL: "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg",
            Description: "Bridge over river",
            Tags: []*Tag{
                &Tag{"bridge", 1},
                &Tag{"river", 2},
                &Tag{"water", 3},
                &Tag{"forest", 4},
            }, 
        },
    }
    return parsedJSON
}

现在,导致内存中数据结构非常庞大的次优映射函数看起来像这样:

func main() {
    result := searchImages()

    tagToUrlMap := make(map[string][]string)

    for _, image := range result {
        for _, tag := range image.Tags {
            // fmt.Println(image.URL, tag.Name)
            tagToUrlMap[tag.Name] = append(tagToUrlMap[tag.Name], image.URL)
        }
    }

    fmt.Println(tagToUrlMap)
}

我可以修改它以使用指向Image结构URL字段的指针,而不是按值复制它:

    // Version 1

    tagToUrlMap := make(map[string][]*string)

    for _, image := range result {
        for _, tag := range image.Tags {
            // fmt.Println(image.URL, tag.Name)
            tagToUrlMap[tag.Name] = append(tagToUrlMap[tag.Name], &image.URL)
        }
    }

它起作用了,我的第一个问题是,以这种方式构建映射后,result数据结构会怎样? Image URL字符串字段会以某种方式保留在内存中,而result的其余部分将被垃圾回收吗?还是result数据结构会保留在内存中直到程序结束,因为某些内容指向其成员?

另一种方法是将URL复制到中间变量,然后使用指向它的指针:

    // Version 2

    tagToUrlMap := make(map[string][]*string)

    for _, image := range result {
        imageUrl = image.URL
        for _, tag := range image.Tags {
            // fmt.Println(image.URL, tag.Name)    
            tagToUrlMap[tag.Name] = append(tagToUrlMap[tag.Name], &imageUrl)
        }
    }

这更好吗? result数据结构会被正确地垃圾回收吗?

或者也许我应该使用指针来代替Image结构中的字符串?

type Image struct {
    URL *string
    Description string
    Tags []*Tag
}

是否有更好的方法可以做到这一点?我还要感谢Go上的任何资源,这些资源都深入描述了指针的各种用法.谢谢!

https://play.golang.org/p/VcKWUYLIpH7

更新:我担心最佳的内存消耗,并且最多不会生成不需要的垃圾.我的目标是使用尽可能少的内存.

解决方案

首先介绍一些背景知识. Go中的string值由类似小型结构的数据结构表示 reflect.StringHeader :

type StringHeader struct {
        Data uintptr
        Len  int
}

因此,基本上传递/复制string值会传递/复制此较小的struct值,无论string的长度如何,该值仅为2个字.在64位体系结构上,即使string具有一千个字符,也只有16个字节.

因此,基本上string值已经用作指针.引入另一个指针,例如*string只会使用法复杂化,并且您实际上不会获得任何显着的内存.为了进行内存优化,请忘记使用*string.

它起作用了,我的第一个问题是,以这种方式构建映射后,结果数据结构会怎样?图片网址字符串字段会以某种方式保留在内存中,其余结果将被垃圾回收吗?还是结果数据结构会保留在内存中直到程序结束,因为某些内容指向其成员?

如果您有一个指向结构值字段的指针值,则整个结构将保留在内存中,无法进行垃圾回收.请注意,尽管可以释放为该结构的其他字段保留的内存,但是当前的Go运行时和垃圾回收器不会这样做.因此,要获得最佳的内存使用率,您应该忘记存储结构字段的地址(除非您还需要完整的结构值,但仍然需要特别注意存储字段地址和切片/数组元素地址).

这样做的原因是因为用于结构值的内存被分配为连续的段,因此仅保留一个引用的字段将极大地分割可用/可用内存,并使最佳内存管理变得越来越困难.高效的.对这些区域进行碎片整理还需要复制引用字段的内存区域,这将需要实时更改"指针值(更改内存地址).

因此,在使用指向string值的指针时,可能会节省一些内存,但增加的复杂性和其他间接操作使其不值得.

那该怎么办?

最佳"解决方案

所以最干净的方法是继续使用string值.

还有我们之前没有提到的另一项优化.

您可以通过解组JSON API响应来获得结果.这意味着,如果JSON响应中多次包含相同的URL或标记值,则将为它们创建不同的string值.

这是什么意思?如果在JSON响应中两次具有相同的URL,则在解组后,您将拥有2个不同的string值,其中包含2个不同的指针,这些指针指向2个不同的已分配字节序列(否则,字符串内容将是相同的). encoding/json软件包不进行 string实习.

这是一个小应用程序,可以证明这一点:

var s []string
err := json.Unmarshal([]byte(`["abc", "abc", "abc"]`), &s)
if err != nil {
    panic(err)
}

for i := range s {
    hdr := (*reflect.StringHeader)(unsafe.Pointer(&s[i]))
    fmt.Println(hdr.Data)
}

上述内容的输出(在游乐场上尝试):

273760312
273760315
273760320

我们看到3个不同的指针.它们可能是相同的,因为string值是不可变的.

json软件包不会检测到重复的string值,因为检测会增加内存和计算开销,这显然是不必要的.但是在我们的案例中,我们追求最佳的内存使用率,因此初始"的额外计算确实值得获得较大的内存增益.

所以让我们做自己的字符串实习.该怎么做?

解组JSON结果后,在构建tagToUrlMap映射的过程中,让我们跟踪遇到的string值,如果以后看到的string值较早,则只需使用该较早的值(字符串描述符).

这是一个非常简单的字符串内部实现:

var cache = map[string]string{}

func interned(s string) string {
    if s2, ok := cache[s]; ok {
        return s2
    }
    // New string, store it
    cache[s] = s
    return s
}

让我们在上面的示例代码中测试此合作伙伴":

var s []string
err := json.Unmarshal([]byte(`["abc", "abc", "abc"]`), &s)
if err != nil {
    panic(err)
}

for i := range s {
    hdr := (*reflect.StringHeader)(unsafe.Pointer(&s[i]))
    fmt.Println(hdr.Data, s[i])
}

for i := range s {
    s[i] = interned(s[i])
}

for i := range s {
    hdr := (*reflect.StringHeader)(unsafe.Pointer(&s[i]))
    fmt.Println(hdr.Data, s[i])
}

以上内容的输出(在游乐场上尝试):

273760312 abc
273760315 abc
273760320 abc
273760312 abc
273760312 abc
273760312 abc

太棒了!如我们所见,在使用我们的interned()函数之后,在我们的数据结构中仅使用了"abc"字符串的单个实例(实际上是第一次出现).这意味着所有其他实例(假设没有其他人使用它们)可以(并且将来)将被正确地进行垃圾收集(由垃圾收集器,在将来的某个时间).

在这里不要忘记的一件事:字符串交互器使用cache字典,该字典存储所有以前遇到的字符串值.因此,要放开这些字符串,您还应该清除"该缓存映射,最简单的方法是为其分配一个nil值.

事不宜迟,让我们看看我们的解决方案:

result := searchImages()

tagToUrlMap := make(map[string][]string)

for _, image := range result {
    imageURL := interned(image.URL)

    for _, tag := range image.Tags {
        tagName := interned(tag.Name)
        tagToUrlMap[tagName] = append(tagToUrlMap[tagName], imageURL)
    }
}

// Clear the interner cache:
cache = nil

要验证结果:

enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "  ")
if err := enc.Encode(tagToUrlMap); err != nil {
    panic(err)
}

输出为(在游乐场上尝试):

{
  "blue": [
    "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
  ],
  "bridge": [
    "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
  ],
  "forest": [
    "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
    "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
  ],
  "ocean": [
    "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
  ],
  "river": [
    "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
  ],
  "water": [
    "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
    "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
  ]
}

进一步的内存优化:

我们使用内置的 append() 函数将新的图像URL添加到标签中. append()可能(并且通常确实)分配了比需要的更大的分片(考虑未来的增长).经过构建"过程后,我们可以遍历tagToUrlMap映射并将这些切片修剪"到所需的最低限度.

这是可以完成的方法:

for tagName, urls := range tagToUrlMap {
    if cap(urls) > len(urls) {
        urls2 := make([]string, len(urls))
        copy(urls2, urls)
        tagToUrlMap[tagName] = urls2
    }
}

I come from a Python/Ruby/JavaScript background. I understand how pointers work, however, I'm not completely sure how to leverage them in the following situation.

Let's pretend we have a fictitious web API that searches some image database and returns a JSON describing what's displayed in each image that was found:

[
    {
        "url": "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
        "description": "Ocean islands",
        "tags": [
            {"name":"ocean", "rank":1},
            {"name":"water", "rank":2},
            {"name":"blue", "rank":3},
            {"name":"forest", "rank":4}
        ]
    },

    ...

    {
        "url": "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg",
        "description": "Bridge over river",
        "tags": [
            {"name":"bridge", "rank":1},
            {"name":"river", "rank":2},
            {"name":"water", "rank":3},
            {"name":"forest", "rank":4}
        ]
    }
]

My goal is to create a data structure in Go that will map each tag to a list of image URLs that would look like this:

{
    "ocean": [
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
    ],
    "water": [
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ],
    "blue": [
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
    ],
    "forest":[
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg", 
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ],
    "bridge": [
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ],
    "river":[
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ]
}

As you can see, each image URL can belong to multiple tags at the same time. If I have thousands of images and even more tags, this data structure can grow very large if image URL strings are copied by value for each tag. This is where I want to leverage pointers.

I can represent the JSON API response by two structs in Go, func searchImages() mimics the fake API:

package main

import "fmt"


type Image struct {
    URL string
    Description string
    Tags []*Tag
}

type Tag struct {
    Name string
    Rank int
}

// this function mimics json.NewDecoder(resp.Body).Decode(&parsedJSON)
func searchImages() []*Image {
    parsedJSON := []*Image{
        &Image {
            URL: "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
            Description: "Ocean islands",
            Tags: []*Tag{
                &Tag{"ocean", 1},
                &Tag{"water", 2},
                &Tag{"blue", 3},
                &Tag{"forest", 4},
            }, 
        },
        &Image {
            URL: "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg",
            Description: "Bridge over river",
            Tags: []*Tag{
                &Tag{"bridge", 1},
                &Tag{"river", 2},
                &Tag{"water", 3},
                &Tag{"forest", 4},
            }, 
        },
    }
    return parsedJSON
}

Now the less optimal mapping function that results in a very large in-memory data structure can look like this:

func main() {
    result := searchImages()

    tagToUrlMap := make(map[string][]string)

    for _, image := range result {
        for _, tag := range image.Tags {
            // fmt.Println(image.URL, tag.Name)
            tagToUrlMap[tag.Name] = append(tagToUrlMap[tag.Name], image.URL)
        }
    }

    fmt.Println(tagToUrlMap)
}

I can modify it to use pointers to the Image struct URL field instead of copying it by value:

    // Version 1

    tagToUrlMap := make(map[string][]*string)

    for _, image := range result {
        for _, tag := range image.Tags {
            // fmt.Println(image.URL, tag.Name)
            tagToUrlMap[tag.Name] = append(tagToUrlMap[tag.Name], &image.URL)
        }
    }

It works and my first question is what happens to the result data structure after I build the mapping in this way? Will the Image URL string fields be left in memory somehow and the rest of the result will be garbage collected? Or will the result data structure stay in memory until the end of the program because something points to its members?

Another way to do this would be to copy the URL to an intermediate variable and use a pointer to it instead:

    // Version 2

    tagToUrlMap := make(map[string][]*string)

    for _, image := range result {
        imageUrl = image.URL
        for _, tag := range image.Tags {
            // fmt.Println(image.URL, tag.Name)    
            tagToUrlMap[tag.Name] = append(tagToUrlMap[tag.Name], &imageUrl)
        }
    }

Is this better? Will the result data structure be garbage collected correctly?

Or perhaps I should use a pointer to string in the Image struct instead?

type Image struct {
    URL *string
    Description string
    Tags []*Tag
}

Is there a better way to do this? I would also appreciate any resources on Go that describe various uses of pointers in depth. Thanks!

https://play.golang.org/p/VcKWUYLIpH7

UPDATE: I'm worried about optimal memory consumption and not generating unwanted garbage the most. My goal is to use the minimal amount of memory possible.

解决方案

First some background. string values in Go are represented by a small struct-like data structure reflect.StringHeader:

type StringHeader struct {
        Data uintptr
        Len  int
}

So basically passing / copying a string value passes / copies this small struct value, which is 2 words only regardless of the length of the string. On 64-bit architectures, it's only 16 bytes, even if the string has a thousand characters.

So basically string values already act as pointers. Introducing another pointer like *string just complicates usage, and you won't really gain any noticable memory. For the sake of memory optimization, forget about using *string.

It works and my first question is what happens to the result data structure after I build the mapping in this way? Will the Image URL string fields be left in memory somehow and the rest of the result will be garbage collected? Or will the result data structure stay in memory until the end of the program because something points to its members?

If you have a pointer value pointing to a field of a struct value, then the whole struct will be kept in memory, it can't be garbage collected. Note that although it could be possible to release memory reserved for other fields of the struct, but the current Go runtime and garbage collector does not do so. So to achieve optimal memory usage, you should forget about storing addresses of struct fields (unless you also need the complete struct values, but still, storing field addresses and slice/array element addresses always requires care).

The reason for this is because memory for struct values are allocated as a contiguous segment, and so keeping only a single referenced field would strongly fragment the available / free memory, and would make optimal memory management even harder and less efficient. Defragmenting such areas would also require copying the referenced field's memory area, which would require "live-changing" pointer values (changing memory addresses).

So while using pointers to string values may save you some tiny memory, the added complexity and additional indirections make it unworthy.

So what to do then?

"Optimal" solution

So the cleanest way is to keep using string values.

And there is one more optimization we didn't talk about earlier.

You get your results by unmarshaling a JSON API response. This means that if the same URL or tag value is included multiple times in the JSON response, different string values will be created for them.

What does this mean? If you have the same URL twice in the JSON response, after unmarshaling, you will have 2 distinct string values which will contain 2 different pointers pointing to 2 different allocated byte sequences (string content which otherwise will be the same). The encoding/json package does not do string interning.

Here's a little app that proves this:

var s []string
err := json.Unmarshal([]byte(`["abc", "abc", "abc"]`), &s)
if err != nil {
    panic(err)
}

for i := range s {
    hdr := (*reflect.StringHeader)(unsafe.Pointer(&s[i]))
    fmt.Println(hdr.Data)
}

Output of the above (try it on the Go Playground):

273760312
273760315
273760320

We see 3 different pointers. They could be the same, as string values are immutable.

The json package does not detect repeating string values because the detection adds memory and computational overhead, which is obviously something unwanted. But in our case we shoot for optimal memory usage, so an "initial", additional computation does worth the big memory gain.

So let's do our own string interning. How to do that?

After unmarshaling the JSON result, during building the tagToUrlMap map, let's keep track of string values we have come across, and if the subsequent string value has been seen earlier, just use that earlier value (its string descriptor).

Here's a very simple string interner implementation:

var cache = map[string]string{}

func interned(s string) string {
    if s2, ok := cache[s]; ok {
        return s2
    }
    // New string, store it
    cache[s] = s
    return s
}

Let's test this "interner" in the example code above:

var s []string
err := json.Unmarshal([]byte(`["abc", "abc", "abc"]`), &s)
if err != nil {
    panic(err)
}

for i := range s {
    hdr := (*reflect.StringHeader)(unsafe.Pointer(&s[i]))
    fmt.Println(hdr.Data, s[i])
}

for i := range s {
    s[i] = interned(s[i])
}

for i := range s {
    hdr := (*reflect.StringHeader)(unsafe.Pointer(&s[i]))
    fmt.Println(hdr.Data, s[i])
}

Output of the above (try it on the Go Playground):

273760312 abc
273760315 abc
273760320 abc
273760312 abc
273760312 abc
273760312 abc

Wonderful! As we can see, after using our interned() function, only a single instance of the "abc" string is used in our data structure (which is actually the first occurrence). This means all other instances (given no one else uses them) can be–and will be–properly garbage collected (by the garbage collector, some time in the future).

One thing to not forget here: the string interner uses a cache dictionary which stores all previously encountered string values. So to let those strings go, you should "clear" this cache map too, simplest done by assigning a nil value to it.

Without further ado, let's see our solution:

result := searchImages()

tagToUrlMap := make(map[string][]string)

for _, image := range result {
    imageURL := interned(image.URL)

    for _, tag := range image.Tags {
        tagName := interned(tag.Name)
        tagToUrlMap[tagName] = append(tagToUrlMap[tagName], imageURL)
    }
}

// Clear the interner cache:
cache = nil

To verify the results:

enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "  ")
if err := enc.Encode(tagToUrlMap); err != nil {
    panic(err)
}

Output is (try it on the Go Playground):

{
  "blue": [
    "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
  ],
  "bridge": [
    "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
  ],
  "forest": [
    "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
    "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
  ],
  "ocean": [
    "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
  ],
  "river": [
    "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
  ],
  "water": [
    "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
    "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
  ]
}

Further memory optimizations:

We used the builtin append() function to add new image URLs to tags. append() may (and usually does) allocate bigger slices than needed (thinking of future growth). After our "build" process, we may go through our tagToUrlMap map and "trim" those slices to the minimum needed.

This is how it could be done:

for tagName, urls := range tagToUrlMap {
    if cap(urls) > len(urls) {
        urls2 := make([]string, len(urls))
        copy(urls2, urls)
        tagToUrlMap[tagName] = urls2
    }
}

这篇关于垃圾回收和Go中指针的正确用法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆