从 C++ 运行 python 脚本时内存泄漏 [英] Memory leak when running python script from C++

查看:120
本文介绍了从 C++ 运行 python 脚本时内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下从 C++ 调用 python 函数的最小示例在我的系统上存在内存泄漏:

script.py:

导入tensorflowdef foo(参数):返回某物"

main.cpp:

#include "python3.5/Python.h"#include #include <字符串>int main(){Py_Initialize();PyRun_SimpleString("导入系统");PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");PyRun_SimpleString("sys.path.append('./')");PyObject* moduleName = PyUnicode_FromString("script");PyObject* pModule = PyImport_Import(moduleName);PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");PyObject* param = PyUnicode_FromString("dummy");PyObject* args = PyTuple_Pack(1, param);PyObject* 结果 = PyObject_CallObject(fooFunc, args);Py_CLEAR(结果);Py_CLEAR(args);Py_CLEAR(参数);Py_CLEAR(fooFunc);Py_CLEAR(pModule);Py_CLEAR(模块名称);Py_Finalize();}

编译

g++ -std=c++11 main.cpp $(python3-config --cflags) $(python3-config --ldflags) -o main

并使用 valgrind 运行

valgrind --leak-check=yes ./main

产生以下摘要

泄漏摘要:==24155== 肯定丢失了:103 个块中有 161,840 个字节==24155== 间接丢失:2 个块中的 33 个字节==24155== 可能丢失:132 个块中有 184,791 个字节==24155== 仍然可达:130,118 个块中有 14,067,324 个字节==24155== 其中可通过启发式访问:==24155== 标准字符串:43,865 个块中的 2,273,096 字节==24155== 被抑制:0 个块中的 0 个字节

我使用的是 Linux Mint 18.2 Sonyag++ 5.4.0Python 3.5.2TensorFlow 1.4.1.

删除 import tensorflow 使泄漏消失.这是 TensorFlow 中的错误还是我做错了什么?(我希望后者是真的.)

<小时>

此外,当我在 Python 中创建 Keras 层时

#script.py从 keras.layers 导入输入def foo(参数):a = 输入(形状=(32,))返回str"

并重复从 C++ 运行对 Python 的调用

//main.cpp#include "python3.5/Python.h"#include #include <字符串>int main(){Py_Initialize();PyRun_SimpleString("导入系统");PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");PyRun_SimpleString("sys.path.append('./')");PyObject* moduleName = PyUnicode_FromString("script");PyObject* pModule = PyImport_Import(moduleName);for (int i = 0; i <10000000; ++i){std::cout <<我<<std::endl;PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");PyObject* param = PyUnicode_FromString("dummy");PyObject* args = PyTuple_Pack(1, param);PyObject* 结果 = PyObject_CallObject(fooFunc, args);Py_CLEAR(结果);Py_CLEAR(args);Py_CLEAR(参数);Py_CLEAR(fooFunc);}Py_CLEAR(pModule);Py_CLEAR(模块名称);Py_Finalize();}

应用程序的内存消耗在运行时不断增长.

所以我想我从 C++ 中调用 python 函数的方式有一些根本性的错误,但它是什么?

解决方案

您的问题中有两种不同类型的内存泄漏".

Valgrind 告诉您关于第一种类型的内存泄漏.然而,python 模块泄漏"内存是很常见的——它主要是一些全局变量,它们在模块加载时被分配/初始化.而且因为模块在 Python 中只加载一次,所以这不是什么大问题.

一个众所周知的例子是 numpy 的

有一个 tensorflow-Graph-object,其中所有的张量都被注册并保持在那里直到 会话已关闭.

到目前为止的理论,无论多么近:

#close session 和免费资源:进口keraskeras.backend.get_session().close()#释放所有资源print("\n\n\n session.close() 后计数:")objgraph.show_most_common_types()

也不是此处建议的解决方案:

 使用 tf.Graph().as_default(), tf.Session() 作为 sess:对于范围内的步长(int(sys.argv[1])):foo(" ")

适用于当前的 tensorflow 版本.这可能是一个错误.

<小时>

简而言之:您在 C++ 代码中没有做错任何事情,没有您要负责的内存泄漏.事实上,如果你从纯 python 脚本中一遍又一遍地调用函数 foo,你会看到完全相同的内存消耗.

所有创建的 Tensor 都在 Graph 对象中注册并且不会自动释放,您必须通过关闭后端会话来释放它们 - 但是由于当前 tensorflow 版本 1.4.0 中的错误,这不起作用.

The following minimal example of calling a python function from C++ has a memory leak on my system:

script.py:

import tensorflow
def foo(param):
    return "something"

main.cpp:

#include "python3.5/Python.h"

#include <iostream>
#include <string>

int main()
{
    Py_Initialize();

    PyRun_SimpleString("import sys");
    PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");
    PyRun_SimpleString("sys.path.append('./')");

    PyObject* moduleName = PyUnicode_FromString("script");
    PyObject* pModule = PyImport_Import(moduleName);
    PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");
    PyObject* param = PyUnicode_FromString("dummy");
    PyObject* args = PyTuple_Pack(1, param);
    PyObject* result = PyObject_CallObject(fooFunc, args);

    Py_CLEAR(result);
    Py_CLEAR(args);
    Py_CLEAR(param);
    Py_CLEAR(fooFunc);
    Py_CLEAR(pModule);
    Py_CLEAR(moduleName);

    Py_Finalize();
}

compiled with

g++ -std=c++11 main.cpp $(python3-config --cflags) $(python3-config --ldflags) -o main

and run with valgrind

valgrind --leak-check=yes ./main

produces the following summary

LEAK SUMMARY:
==24155==    definitely lost: 161,840 bytes in 103 blocks
==24155==    indirectly lost: 33 bytes in 2 blocks
==24155==      possibly lost: 184,791 bytes in 132 blocks
==24155==    still reachable: 14,067,324 bytes in 130,118 blocks
==24155==                       of which reachable via heuristic:
==24155==                         stdstring          : 2,273,096 bytes in 43,865 blocks
==24155==         suppressed: 0 bytes in 0 blocks

I'm using Linux Mint 18.2 Sonya, g++ 5.4.0, Python 3.5.2 and TensorFlow 1.4.1.

Removing import tensorflow makes the leak disappear. Is this a bug in TensorFlow or did I do something wrong? (I expect the latter to be true.)


Additionally when I create a Keras layer in Python

#script.py
from keras.layers import Input
def foo(param):
    a = Input(shape=(32,))
    return "str"

and run the call to Python from C++ repeatedly

//main.cpp

#include "python3.5/Python.h"

#include <iostream>
#include <string>

int main()
{
    Py_Initialize();

    PyRun_SimpleString("import sys");
    PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");
    PyRun_SimpleString("sys.path.append('./')");

    PyObject* moduleName = PyUnicode_FromString("script");
    PyObject* pModule = PyImport_Import(moduleName);

    for (int i = 0; i < 10000000; ++i)
    {
        std::cout << i << std::endl;
        PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");
        PyObject* param = PyUnicode_FromString("dummy");
        PyObject* args = PyTuple_Pack(1, param);
        PyObject* result = PyObject_CallObject(fooFunc, args);

        Py_CLEAR(result);
        Py_CLEAR(args);
        Py_CLEAR(param);
        Py_CLEAR(fooFunc);
    }

    Py_CLEAR(pModule);
    Py_CLEAR(moduleName);

    Py_Finalize();
}

the memory consumption of the application continuously grows ad infinitum during runtime.

So I guess there is something fundamentally wrong with the way I call the python function from C++, but what is it?

解决方案

There are two different types "memory leaks" in your question.

Valgrind is telling you about the first type of memory leaks. However, it is pretty usual for python modules to "leak" memory - it is mostly some globals which are allocated/initialized when the module is loaded. And because the module is loaded only once in Python its not a big problem.

A well known example is numpy's PyArray_API: It must be initialized via _import_array, is then never deleted and stays in memory until the python interpreter is shut down.

So it is a "memory leak" per design, you can argue whether it is a good design or not, but at the end of the day there is nothing you could do about it.

I don't have enough insight into the tensorflow-module to pin-point the places where such memory leaks happen, but I'm pretty sure that it's nothing you should worry about.


The second "memory leak" is more subtle.

You can get a lead, when you compare the valgrind output for 10^4 and 10^5 iterations of the loop - there will be almost no difference! There is however difference in the peak-memory consumption.

Differently from C++, Python has a garbage collector - so you cannot know when exactly an object is destructed. CPython uses reference counting, so when a reference count gets 0, the object is destroyed. However, when there is a cycle of references (e.g. object A holds a reference of object B and object B holds a reference of object B) it is not so simple: the garbage collector needs to iterate through all objects to find such no longer used cycles.

One could think, that keras.layers.Input has such a cycle somewhere (and this is true), but this is not the reason for this "memory leak", which can be observed also for pure python.

We use objgraph-package to inspect the references, let's run the following python script:

#pure.py
from keras.layers import Input
import gc
import sys
import objgraph


def foo(param):
    a = Input(shape=(1280,))
    return "str"

###  MAIN :

print("Counts at the beginning:")
objgraph.show_most_common_types()
objgraph.show_growth(limit=7) 

for i in range(int(sys.argv[1])):
   foo(" ")

gc.collect()# just to be sure

print("\n\n\n Counts at the end")
objgraph.show_most_common_types()
objgraph.show_growth(limit=7)

import random
objgraph.show_chain(
   objgraph.find_backref_chain(
        random.choice(objgraph.by_type('Tensor')), #take some random tensor
         objgraph.is_proper_module),
    filename='chain.png') 

and run it:

>>> python pure.py 1000

We can see the following: at the end there are exactly 1000 Tersors, that means none of our created objects got disposed!

If we take a look at the chain, which keeps a tensor-object alive (was created with objgraph.show_chain), so we see:

that there is a tensorflow-Graph-object where all tensors are registered and stay there until session is closed.

So far the theory, however neighter:

#close session and free resources:
import keras
keras.backend.get_session().close()#free all resources

print("\n\n\n Counts after session.close():")
objgraph.show_most_common_types()

nor the here proposed solution:

with tf.Graph().as_default(), tf.Session() as sess:
   for step in range(int(sys.argv[1])):
     foo(" ")

has worked for the current tensorflow-version. Which is probably a bug.


In a nutshell: You do nothing wrong in your c++-code, there are no memory leaks you are responsible for. In fact you would see exactly same memory consumption if you would call the function foo from a pure python-script over and over again.

All created Tensors are registered in a Graph-object and aren't automatically released, you must release them by closing the backend session - which however doesn't work due to a bug in the current tensorflow-version 1.4.0.

这篇关于从 C++ 运行 python 脚本时内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆