如何在生产系统的 Python 进程中找到正在使用内存的内容? [英] How do I find what is using memory in a Python process in a production system?

查看:22
本文介绍了如何在生产系统的 Python 进程中找到正在使用内存的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的生产系统偶尔会出现内存泄漏,我无法在开发环境中重现.我在开发环境中使用了

任何与内存泄漏作斗争的人都可能非常熟悉这种形状.

查找泄漏对象

现在是 dozer 的时候了.我将展示非仪器化的情况(如果可以,您可以以类似的方式检测您的代码).要将 Dozer 服务器注入目标进程,我将使用

之后,我再次从 (4) 运行 python demo.py 并等待它完成.然后在推土机中我设置了地板"到 5000,这是我看到的:

与 Celery 相关的两种类型随着子任务的调度而增长:

  • celery.result.AsyncResult
  • vine.promises.promise

weakref.WeakMethod 具有相同的形状和数字,并且必须由相同的事物引起.

寻找根本原因

在这一点上,从泄漏类型和趋势来看,您的情况可能已经很清楚了.如果不是,Dozer 具有TRACE"功能.每个类型的链接,允许跟踪(例如查看对象的属性)所选对象的引用(gc.get_referrers)和引用对象(gc.get_referents),并再次继续遍历图表.

但是一张图说了一千个字,对吧?所以我将展示如何使用

然后在 Pyrasite shell 中运行:

objgraph.show_backrefs([objgraph.at(140254427663376)], filename='backref.png')

PNG 文件应包含:

基本上有一些 Context 对象包含一个名为 _childrenlist ,它又包含许多 celery.result.AsyncResult 的实例,泄漏.在 Dozer 中更改 Filter=celery.*context 这是我看到的:

所以罪魁祸首是celery.app.task.Context.搜索该类型肯定会将您带到

问题解决了!

My production system occasionally exhibits a memory leak I have not been able to reproduce in a development environment. I've used a Python memory profiler (specifically, Heapy) with some success in the development environment, but it can't help me with things I can't reproduce, and I'm reluctant to instrument our production system with Heapy because it takes a while to do its thing and its threaded remote interface does not work well in our server.

What I think I want is a way to dump a snapshot of the production Python process (or at least gc.get_objects), and then analyze it offline to see where it is using memory. How do I get a core dump of a python process like this? Once I have one, how do I do something useful with it?

解决方案

I will expand on Brett's answer from my recent experience. Dozer package is well maintained, and despite advancements, like addition of tracemalloc to stdlib in Python 3.4, its gc.get_objects counting chart is my go-to tool to tackle memory leaks. Below I use dozer > 0.7 which has not been released at the time of writing (well, because I contributed a couple of fixes there recently).

Example

Let's look at a non-trivial memory leak. I'll use Celery 4.4 here and will eventually uncover a feature which causes the leak (and because it's a bug/feature kind of thing, it can be called mere misconfiguration, cause by ignorance). So there's a Python 3.6 venv where I pip install celery < 4.5. And have the following module.

demo.py

import time

import celery 


redis_dsn = 'redis://localhost'
app = celery.Celery('demo', broker=redis_dsn, backend=redis_dsn)

@app.task
def subtask():
    pass

@app.task
def task():
    for i in range(10_000):
        subtask.delay()
        time.sleep(0.01)


if __name__ == '__main__':
    task.delay().get()

Basically a task which schedules a bunch of subtasks. What can go wrong?

I'll use procpath to analyse Celery node memory consumption. pip install procpath. I have 4 terminals:

  1. procpath record -d celery.sqlite -i1 "$..children[?('celery' in @.cmdline)]" to record the Celery node's process tree stats
  2. docker run --rm -it -p 6379:6379 redis to run Redis which will serve as Celery broker and result backend
  3. celery -A demo worker --concurrency 2 to run the node with 2 workers
  4. python demo.py to finally run the example

(4) will finish under 2 minutes.

Then I use sqliteviz (pre-built version) to visualise what procpath has recorder. I drop the celery.sqlite there and use this query:

SELECT datetime(ts, 'unixepoch', 'localtime') ts, stat_pid, stat_rss / 256.0 rss
FROM record 

And in sqliteviz I create a line chart trace with X=ts, Y=rss, and add split transform By=stat_pid. The result chart is:

This shape is likely pretty familiar to anyone who fought with memory leaks.

Finding leaking objects

Now it's time for dozer. I'll show non-instrumented case (and you can instrument your code in similar way if you can). To inject Dozer server into target process I'll use Pyrasite. There are two things to know about it:

  • To run it, ptrace has to be configured as "classic ptrace permissions": echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope, which is may be a security risk
  • There are non-zero chances that your target Python process will crash

With that caveat I:

  • pip install https://github.com/mgedmin/dozer/archive/3ca74bd8.zip (that's to-be 0.8 I mentioned above)
  • pip install pillow (which dozer uses for charting)
  • pip install pyrasite

After that I can get Python shell in the target process:

pyrasite-shell 26572

And inject the following, which will run Dozer's WSGI application using stdlib's wsgiref's server.

import threading
import wsgiref.simple_server

import dozer


def run_dozer():
    app = dozer.Dozer(app=None, path='/')
    with wsgiref.simple_server.make_server('', 8000, app) as httpd:
        print('Serving Dozer on port 8000...')
        httpd.serve_forever()

threading.Thread(target=run_dozer, daemon=True).start()

Opening http://localhost:8000 in a browser there should see something like:

After that I run python demo.py from (4) again and wait for it to finish. Then in Dozer I set "Floor" to 5000, and here's what I see:

Two types related to Celery grow as the subtask are scheduled:

  • celery.result.AsyncResult
  • vine.promises.promise

weakref.WeakMethod has the same shape and numbers and must be caused by the same thing.

Finding root cause

At this point from the leaking types and the trends it may be already clear what's going on in your case. If it's not, Dozer has "TRACE" link per type, which allows tracing (e.g. seeing object's attributes) chosen object's referrers (gc.get_referrers) and referents (gc.get_referents), and continue the process again traversing the graph.

But a picture says a thousand words, right? So I'll show how to use objgraph to render chosen object's dependency graph.

  • pip install objgraph
  • apt-get install graphviz

Then:

  • I run python demo.py from (4) again
  • in Dozer I set floor=0, filter=AsyncResult
  • and click "TRACE" which should yield

Then in Pyrasite shell run:

objgraph.show_backrefs([objgraph.at(140254427663376)], filename='backref.png')

The PNG file should contain:

Basically there's some Context object containing a list called _children that in turn is containing many instances of celery.result.AsyncResult, which leak. Changing Filter=celery.*context in Dozer here's what I see:

So the culprit is celery.app.task.Context. Searching that type would certainly lead you to Celery task page. Quickly searching for "children" there, here's what it says:

trail = True

If enabled the request will keep track of subtasks started by this task, and this information will be sent with the result (result.children).

Disabling the trail by setting trail=False like:

@app.task(trail=False)
def task():
    for i in range(10_000):
        subtask.delay()
        time.sleep(0.01)

Then restarting the Celery node from (3) and python demo.py from (4) yet again, shows this memory consumption.

Problem solved!

这篇关于如何在生产系统的 Python 进程中找到正在使用内存的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆