如何使用子进程强制python释放内存? [英] How do i use subprocesses to force python to release memory?

查看:423
本文介绍了如何使用子进程强制python释放内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读 Python内存管理,并且会希望减少我的应用程序的内存占用. 建议 子流程在缓解该问题方面将大有帮助;但是我在概念化需要做的事情上遇到了麻烦.有人可以提供一个简单的例子来说明如何改变这个...

I was reading up on Python Memory Management and would like to reduce the memory footprint of my application. It was suggested that subprocesses would go a long way in mitigating the problem; but i'm having trouble conceptualizing what needs to be done. Could some one please provide a simple example of how to turn this...

def my_function():
    x = range(1000000)
    y = copy.deepcopy(x)
    del x
    return y

@subprocess_witchcraft
def my_function_dispatcher(*args):
    return my_function()

...变成不存储额外自由列表"的 real 子处理函数?

...into a real subprocessed function that doesn't store an extra "free-list"?

这个自由列表"概念也适用于python c扩展吗?

Does this "free-list" concept apply to python c-extensions as well?

推荐答案

关于优化建议的重要事项是确保仅在子进程中调用my_function(). deepcopydel无关紧要-一旦在一个进程中创建500万个不同的整数,并同时保留所有整数,就结束了.即使您停止引用这些对象,Python也会通过暂时保留对500万个空整数对象大小的字段的引用来释放它们,以等待它们重用于下一个要创建500万个整数的函数.这是另一个答案中提到的免费列表,它购买了int和float的快速分配和取消分配.对于Python来说公平的是,这并不是内存泄漏,因为内存肯定可以用于进一步的分配.但是,直到该进程结束,该内存才将返回给系统,也不会被重用于分配相同类型的数字.

The important thing about the optimization suggestion is to make sure that my_function() is only invoked in a subprocess. The deepcopy and del are irrelevant — once you create five million distinct integers in a process, holding onto all of them at the same time, it's game over. Even if you stop referring to those objects, Python will free them by keeping references to five million empty integer-object-sized fields in a limbo where they await reuse for the next function that wants to create five million integers. This is the free list mentioned in the other answer, and it buys blindingly fast allocation and deallocation of ints and floats. It is only fair to Python to note that this is not a memory leak since the memory is definitely made available for further allocations. However, that memory will not get returned to the system until the process ends, nor will it be reused for anything other than allocating numbers of the same type.

大多数程序都不会出现此问题,因为大多数程序不会创建病理上庞大的数字列表,不会释放它们,然后期望将该内存用于其他对象.使用numpy的程序也是安全的,因为numpy以紧密打包的本机格式存储其数组的数字数据.对于确实遵循这种使用模式的程序,减轻问题的方法是首先不要同时创建大量整数,至少不需要在需要将内存返回系统的过程中同时创建大量整数.目前尚不清楚您拥有确切的用例,但实际解决方案可能需要的不仅仅是魔术装饰器".

Most programs don't have this problem because most programs do not create pathologically huge lists of numbers, free them, and then expect to reuse that memory for other objects. Programs using numpy are also safe because numpy stores numeric data of its arrays in tightly packed native format. For programs that do follow this usage pattern, the way to mitigate the problem is by not creating a large number of the integers at the same time in the first place, at least not in the process which needs to return memory to the system. It is unclear what exact use case you have, but a real-world solution will likely require more than a "magic decorator".

这是子进程进入的地方:如果数字列表是在另一个进程中创建的,则与该列表关联的所有内存(包括但不限于int的存储)都将被释放,并仅通过以下方式返回给系统终止子流程的行为.当然,您必须设计程序,以便可以在子系统中创建和处理列表,而无需传输所有这些数字.子流程可以接收创建数据集所需的信息,并且可以将通过处理列表获得的信息发回.

This is where subprocess come in: if the list of numbers is created in another process, then all the memory associated with the list, including but not limited to storage of ints, is both freed and returned to the system by the mere act of terminating the subprocess. Of course, you must design your program so that the list can be both created and processed in the subsystem, without requiring the transfer of all these numbers. The subprocess can receive information needed to create the data set, and can send back the information obtained from processing the list.

为说明原理,让我们升级示例,以便实际上需要存在整个列表-假设我们正在对排序算法进行基准测试.我们想要创建一个庞大的整数列表,对其进行排序,并可靠地释放与该列表关联的内存,以便下一个基准测试可以根据自己的需要分配内存,而不必担心RAM不足.要生成子流程并进行通信,请使用multiprocessing模块:

To illustrate the principle, let's upgrade your example so that the whole list actually needs to exist - say we're benchmarking sorting algorithms. We want to create a huge list of integers, sort it, and reliably free the memory associated with the list, so that the next benchmark can allocate memory for its own needs without worrying of running out of RAM. To spawn the subprocess and communicate, this uses the multiprocessing module:

# To run this, save it to a file that looks like a valid Python module, e.g.
# "foo.py" - multiprocessing requires being able to import the main module.
# Then run it with "python foo.py".

import multiprocessing, random, sys, os, time

def create_list(size):
    # utility function for clarity - runs in subprocess
    maxint = sys.maxint
    randrange = random.randrange
    return [randrange(maxint) for i in xrange(size)]

def run_test(state):
    # this function is run in a separate process
    size = state['list_size']
    print 'creating a list with %d random elements - this can take a while... ' % size,
    sys.stdout.flush()
    lst = create_list(size)
    print 'done'
    t0 = time.time()
    lst.sort()
    t1 = time.time()
    state['time'] = t1 - t0

if __name__ == '__main__':
    manager = multiprocessing.Manager()
    state = manager.dict(list_size=5*1000*1000)  # shared state
    p = multiprocessing.Process(target=run_test, args=(state,))
    p.start()
    p.join()
    print 'time to sort: %.3f' % state['time']
    print 'my PID is %d, sleeping for a minute...' % os.getpid()
    time.sleep(60)
    # at this point you can inspect the running process to see that it
    # does not consume excess memory

奖励答案

由于红利问题不清楚,因此很难提供答案. 自由列表概念"正是一个概念,一种实现策略,需要在常规Python分配器之上显式编码.大多数Python类型不使用该分配策略,例如,它不用于使用class语句创建的类的实例.实施免费名单并不难,但它相当先进,很少有正当理由.如果某些扩展作者已经选择为其类型之一使用免费列表,则可以预期他们知道免费列表提供的折衷方案,即以成本获得了超快的分配/取消分配一些额外的空间(用于空闲列表中的对象以及空闲列表本身),并且无法将内存重用于其他用途.

Bonus Answer

It is hard to provide an answer to the bonus question, since the question is unclear. The "free list concept" is exactly that, a concept, an implementation strategy that needs to be explicitly coded on top of the regular Python allocator. Most Python types do not use that allocation strategy, for example it is not used for instances of classes created with the class statement. Implementing a free list is not hard, but it is fairly advanced and rarely undertaken without good reason. If some extension author has chosen to use a free list for one of its types, it can be expected that they are aware of the tradeoff a free list offers — gaining extra-fast allocation/deallocation at the cost of some additional space (for the objects on the free list and the free list itself) and inability to reuse the memory for something else.

这篇关于如何使用子进程强制python释放内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆