Go 1.3 垃圾收集器不会将服务器内存释放回系统 [英] Go 1.3 Garbage collector not releasing server memory back to system

查看:22
本文介绍了Go 1.3 垃圾收集器不会将服务器内存释放回系统的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们编写了最简单的 TCP 服务器(带有少量日志记录)来检查内存占用(参见下面的 tcp-server.go)

We wrote the simplest possible TCP server (with minor logging) to examine the memory footprint (see tcp-server.go below)

服务器只接受连接而不做任何事情.它在 Go 版本为 go1.3 linux/amd64 的 Ubuntu 12.04.4 LTS 服务器(内核 3.2.0-61-generic)上运行.

The server simply accepts connections and does nothing. It is being run on an Ubuntu 12.04.4 LTS server (kernel 3.2.0-61-generic) with Go version go1.3 linux/amd64.

附加的基准测试程序 (pulse.go) 在此示例中创建 10k 连接,30 秒后断开连接,重复此循环 3 次,然后连续重复 1k 连接/断开连接的小脉冲.用于测试的命令是./pulse -big=10000 -bs=30.

The attached benchmarking program (pulse.go) creates, in this example, 10k connections, disconnects them after 30 seconds, repeats this cycle three times, and then continuously repeats small pulses of 1k connections/disconnections. The command used to test was ./pulse -big=10000 -bs=30.

第一张附图是记录runtime.ReadMemStats,当客户端数量变化了500的倍数时得到的,第二张图是top"看到的服务器进程的RES内存大小.

The first attached graph is obtained by recording runtime.ReadMemStats when the number of clients has changed by a multiple of 500, and the second graph is the RES memory size seen by "top" for the server process.

服务器以可忽略的 1.6KB 内存启动.然后内存由 10k 连接的大"脉冲设置为 ~60MB(如顶部所示),或大约 16MB 的SystemMemory",如 ReadMemStats 所见.正如预期的那样,当 10K 脉冲结束时,正在使用的内存会下降,最终程序开始将内存释放回操作系统,如灰色的已释放内存"行所示.

The server starts with a negligible 1.6KB of memory. Then the memory is set by the "big" pulses of 10k connections at ~60MB (as seen by top), or at about 16MB "SystemMemory" as seen by ReadMemStats. As expected, when the 10K pulses end, the in-use memory drops, and eventually the program starts releasing memory back to OS as evidenced by the grey "Released Memory" line.

问题在于系统内存(以及相应的top"所见的 RES 内存)从未显着下降(尽管如第二张图所示下降了一点).

The problem is that the System Memory (and correspondingly, the RES memory seen by "top") never drops significantly (although it drops a little as seen in the second graph).

我们预计在 10K 脉冲结束后,内存将继续释放,直到 RES 大小达到处理每个 1k 脉冲所需的最小值(如top"所见为 8m RES,并报告为 2MB in-use运行时.ReadMemStats).相反,RES 保持在大约 56MB,并且在使用中从未从其最高值 60MB 下降.

We would expect that after the 10K pulses end, memory would continue to be released until the RES size is the minimum needed for handling each 1k pulse (which is 8m RES as seen by "top" and 2MB in-use reported by runtime.ReadMemStats). Instead, the RES stays at about 56MB and in-use never drops from its highest value of 60MB at all.

我们希望确保偶尔出现峰值的不规则流量的可扩展性,并能够在同一台机器上运行多个服务器,这些服务器在不同时间出现峰值.有没有办法有效地确保在合理的时间范围内将尽可能多的内存释放回系统?

We want to ensure scalability for irregular traffic with occasional spikes as well as be able to run multiple servers on the same box that have spikes at different times. Is there a way to effectively ensure that as much memory is released back to the system as possible in a reasonable time frame?

代码 https://gist.github.com/eugene-bulkin/e8d690b4db144f468bc5:

server.go:

package main

import (
  "net"
  "log"
  "runtime"
  "sync"
)
var m sync.Mutex
var num_clients = 0
var cycle = 0

func printMem() {
  var ms runtime.MemStats
  runtime.ReadMemStats(&ms)
  log.Printf("Cycle #%3d: %5d clients | System: %8d Inuse: %8d Released: %8d Objects: %6d
", cycle, num_clients, ms.HeapSys, ms.HeapInuse, ms.HeapReleased, ms.HeapObjects)
}

func handleConnection(conn net.Conn) {
  //log.Println("Accepted connection:", conn.RemoteAddr())
  m.Lock()
  num_clients++
  if num_clients % 500 == 0 {
    printMem()
  }
  m.Unlock()
  buffer := make([]byte, 256)
  for {
    _, err := conn.Read(buffer)
    if err != nil {
      //log.Println("Lost connection:", conn.RemoteAddr())
      err := conn.Close()
      if err != nil {
        log.Println("Connection close error:", err)
      }
      m.Lock()
      num_clients--
      if num_clients % 500 == 0 {
        printMem()
      }
      if num_clients == 0 {
        cycle++
      }
      m.Unlock()
      break
    }
  }
}

func main() {
  printMem()
  cycle++
  listener, err := net.Listen("tcp", ":3033")
  if err != nil {
    log.Fatal("Could not listen.")
  }
  for {
    conn, err := listener.Accept()
    if err != nil {
      log.Println("Could not listen to client:", err)
      continue
    }
    go handleConnection(conn)
  }
}

脉冲:

package main

import (
  "flag"
  "net"
  "sync"
  "log"
  "time"
)

var (
  numBig = flag.Int("big", 4000, "Number of connections in big pulse")
  bigIters = flag.Int("i", 3, "Number of iterations of big pulse")
  bigSep = flag.Int("bs", 5, "Number of seconds between big pulses")
  numSmall = flag.Int("small", 1000, "Number of connections in small pulse")
  smallSep = flag.Int("ss", 20, "Number of seconds between small pulses")
  linger = flag.Int("l", 4, "How long connections should linger before being disconnected")
)

var m sync.Mutex

var active_conns = 0
var connections = make(map[net.Conn] bool)

func pulse(n int, linger int) {
  var wg sync.WaitGroup

  log.Printf("Connecting %d client(s)...
", n)
  for i := 0; i < n; i++ {
    wg.Add(1)
    go func() {
      m.Lock()
      defer m.Unlock()
      defer wg.Done()
      active_conns++
      conn, err := net.Dial("tcp", ":3033")
      if err != nil {
        log.Panicln("Unable to connect: ", err)
        return
      }
      connections[conn] = true
    }()
  }
  wg.Wait()
  if len(connections) != n {
    log.Fatalf("Unable to connect all %d client(s).
", n)
  }
  log.Printf("Connected %d client(s).
", n)
  time.Sleep(time.Duration(linger) * time.Second)
  for conn := range connections {
    active_conns--
    err := conn.Close()
    if err != nil {
      log.Panicln("Unable to close connection:", err)
      conn = nil
      continue
    }
    delete(connections, conn)
    conn = nil
  }
  if len(connections) > 0 {
    log.Fatalf("Unable to disconnect all %d client(s) [%d remain].
", n, len(connections))
  }
  log.Printf("Disconnected %d client(s).
", n)
}

func main() {
  flag.Parse()
  for i := 0; i < *bigIters; i++ {
    pulse(*numBig, *linger)
    time.Sleep(time.Duration(*bigSep) * time.Second)
  }
  for {
    pulse(*numSmall, *linger)
    time.Sleep(time.Duration(*smallSep) * time.Second)
  }
}

推荐答案

首先,请注意 Go 本身并不总是缩小自己的内存空间:

First, note that Go, itself, doesn't always shrink its own memory space:

https://groups.google.com/forum/#!topic/Golang-Nuts/vfmd6zaRQVs

堆被释放,你可以使用runtime.ReadMemStats()检查这个,但进程虚拟地址空间不会缩小——即你的程序不会将内存返回给操作系统.基于 Unix平台我们使用系统调用来告诉操作系统它可以回收堆中未使用的部分,此功能不可用在 Windows 平台上.

The heap is freed, you can check this using runtime.ReadMemStats(), but the processes virtual address space does not shrink -- ie, your program will not return memory to the operating system. On Unix based platforms we use a system call to tell the operating system that it can reclaim unused parts of the heap, this facility is not available on Windows platforms.

但你不是在 Windows 上,对吧?

But you're not on Windows, right?

嗯,这个帖子不太确定,但它说:

Well, this thread is less definitive, but it says:

https://groups.google.com/forum/#!topic/golang-nuts/MC2hWpuT7Xc

据我所知,内存在被标记后大约 5 分钟返回给操作系统GC 免费.并且 GC 每两分钟运行一次,如果不是由内存使用增加触发.所以最坏的情况是 7分钟被释放.

As I understand, memory is returned to the OS about 5 minutes after is has been marked as free by the GC. And the GC runs every two minutes top, if not triggered by an increase in memory use. So worst-case would be 7 minutes to be freed.

在这种情况下,我认为切片没有被标记为已释放,而是在使用,所以它永远不会返回到操作系统.

In this case, I think that the slice is not marked as freed, but in use, so it would never be returned to the OS.

可能是您等待的时间不够长,等待 GC 扫描和 OS 返回扫描,这可能在最后一个大"脉冲之后长达 7 分钟.您可以使用 runtime.FreeOSMemory 明确强制执行此操作,但请记住,除非 GC 已运行,否则它不会执行任何操作.

It's possible you weren't waiting long enough for the GC sweep followed by the OS return sweep, which could be up to 7 minutes after the final "big" pulse. You can explicitly force this with runtime.FreeOSMemory, but keep in mind that it won't do anything unless the GC has been run.

(请注意,您可以使用 runtime.GC() 强制进行垃圾收集,但显然您需要注意使用它的频率;您可能可以将其与突然向下的峰值同步在连接中).

( Note that you can force garbage collection with runtime.GC() though obviously you need to be careful how often you use it; you may be able to sync it with sudden downward spikes in connections).

顺便说一句,我找不到这方面的明确来源(除了我发布的第二个帖子,有人提到了同样的事情),但我记得它被多次提到,并不是 Go 使用的所有内存是真实"的记忆.如果它是由运行时分配的,但实际上并未被程序使用,则无论 topMemStats 怎么说,操作系统实际上都在使用内存,因此内存量程序真正"使用的情况经常被高估.

As a slight aside, I can't find an explicit source for this (other than the second thread I posted where someone mentions the same thing), but I recall it being mentioned several times that not all of the memory Go uses is "real" memory. If it's allocated by the runtime but not actually in use by the program, the OS actually has use of the memory regardless of what top or MemStats says, so the amount of memory the program is "really" using is often very overreported.

作为评论中的 Kostix note 并支持 JimB 的回答,这个问题在 Golang-nuts 上交叉发布,我们从 Dmitri Vyukov 那里得到了相当明确的答案:

As Kostix notex in the comments and supports JimB's answer, this question was crossposted on Golang-nuts and we got a rather definitive answer from Dmitri Vyukov:

https://groups.google.com/forum/#!topic/golang-nuts/0WSOKnHGBZE/讨论

我今天没有解决方案.大部分内存似乎被 goroutine 堆栈占用,我们不会将这些内存释放给操作系统.下个版本会好一些.

I don't there is a solution today. Most of the memory seems to be occupied by goroutine stacks, and we don't release that memory to OS. It will be somewhat better in the next release.

所以我所概述的仅适用于堆变量,Goroutine 堆栈上的内存永远不会被释放.这与我最后一个并非所有显示的分配的系统内存都是‘真实内存’"这一点究竟如何相互作用还有待观察.

So what I outlines only applies to heap variables, memory on a Goroutine stack will never be released. How exactly this interacts with my last "not all shown allocated system memory is 'real memory'" point remains to be seen.

这篇关于Go 1.3 垃圾收集器不会将服务器内存释放回系统的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆