如何使用子进程强制 python 释放内存? [英] How do i use subprocesses to force python to release memory?

查看:20
本文介绍了如何使用子进程强制 python 释放内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读 Python 内存管理 并且会喜欢减少我的应用程序的内存占用.有人建议 子流程 对缓解问题大有帮助;但我在概念化需要做的事情时遇到了麻烦.有人可以提供一个简单的例子来说明如何改变这个......

I was reading up on Python Memory Management and would like to reduce the memory footprint of my application. It was suggested that subprocesses would go a long way in mitigating the problem; but i'm having trouble conceptualizing what needs to be done. Could some one please provide a simple example of how to turn this...

def my_function():
    x = range(1000000)
    y = copy.deepcopy(x)
    del x
    return y

@subprocess_witchcraft
def my_function_dispatcher(*args):
    return my_function()

...进入一个真正的子处理函数,它不存储额外的空闲列表"?

...into a real subprocessed function that doesn't store an extra "free-list"?

这个free-list"是否可以?概念也适用于 python c-extensions?

Does this "free-list" concept apply to python c-extensions as well?

推荐答案

关于优化建议的重要一点是确保 my_function() 只在子进程中被调用.deepcopydel 无关紧要——一旦你在一个进程中创建了 500 万个不同的整数,同时保留所有这些,游戏就结束了.即使您停止引用这些对象,Python 也会通过将 500 万个空的整数对象大小字段的引用保留在一个不确定状态中来释放它们,在那里它们等待下一个想要创建 500 万个整数的函数重用.这是另一个答案中提到的空闲列表,它购买了整数和浮点数的快速分配和解除分配.Python 注意到这不是内存泄漏是公平的,因为内存肯定可用于进一步分配.但是,在进程结束之前,该内存不会返回给系统,也不会被重用于除分配相同类型的数字之外的任何其他事情.

The important thing about the optimization suggestion is to make sure that my_function() is only invoked in a subprocess. The deepcopy and del are irrelevant — once you create five million distinct integers in a process, holding onto all of them at the same time, it's game over. Even if you stop referring to those objects, Python will free them by keeping references to five million empty integer-object-sized fields in a limbo where they await reuse for the next function that wants to create five million integers. This is the free list mentioned in the other answer, and it buys blindingly fast allocation and deallocation of ints and floats. It is only fair to Python to note that this is not a memory leak since the memory is definitely made available for further allocations. However, that memory will not get returned to the system until the process ends, nor will it be reused for anything other than allocating numbers of the same type.

大多数程序没有这个问题,因为大多数程序不会创建病态的巨大数字列表,释放它们,然后期望将这些内存重用于其他对象.使用 numpy 的程序也是安全的,因为 numpy 以紧密压缩的原生格式存储其数组的数字数据.对于遵循这种使用模式的程序,缓解问题的方法是首先不要同时创建大量整数,至少不在需要将内存返回给系统的进程中创建.目前还不清楚您有什么确切的用例,但现实世界的解决方案可能需要的不仅仅是魔法装饰器".

Most programs don't have this problem because most programs do not create pathologically huge lists of numbers, free them, and then expect to reuse that memory for other objects. Programs using numpy are also safe because numpy stores numeric data of its arrays in tightly packed native format. For programs that do follow this usage pattern, the way to mitigate the problem is by not creating a large number of the integers at the same time in the first place, at least not in the process which needs to return memory to the system. It is unclear what exact use case you have, but a real-world solution will likely require more than a "magic decorator".

这就是子进程的用武之地:如果数字列表是在另一个进程中创建的,那么与列表相关的所有内存,包括但不限于整数的存储,都将被释放并返回给系统.终止子进程的行为.当然,您必须设计您的程序,以便可以在子系统中创建和处理列表,而无需传输所有这些数字.子进程可以接收创建数据集所需的信息,并可以将处理列表获得的信息发回.

This is where subprocess come in: if the list of numbers is created in another process, then all the memory associated with the list, including but not limited to storage of ints, is both freed and returned to the system by the mere act of terminating the subprocess. Of course, you must design your program so that the list can be both created and processed in the subsystem, without requiring the transfer of all these numbers. The subprocess can receive information needed to create the data set, and can send back the information obtained from processing the list.

为了说明原理,让我们升级您的示例,以便实际需要存在整个列表 - 假设我们正在对排序算法进行基准测试.我们想要创建一个巨大的整数列表,对其进行排序,并可靠地释放与该列表关联的内存,以便下一个基准测试可以根据自己的需要分配内存,而不必担心内存不足.为了生成子进程并进行通信,这使用了 multiprocessing 模块:

To illustrate the principle, let's upgrade your example so that the whole list actually needs to exist - say we're benchmarking sorting algorithms. We want to create a huge list of integers, sort it, and reliably free the memory associated with the list, so that the next benchmark can allocate memory for its own needs without worrying of running out of RAM. To spawn the subprocess and communicate, this uses the multiprocessing module:

# To run this, save it to a file that looks like a valid Python module, e.g.
# "foo.py" - multiprocessing requires being able to import the main module.
# Then run it with "python foo.py".

import multiprocessing, random, sys, os, time

def create_list(size):
    # utility function for clarity - runs in subprocess
    maxint = sys.maxint
    randrange = random.randrange
    return [randrange(maxint) for i in xrange(size)]

def run_test(state):
    # this function is run in a separate process
    size = state['list_size']
    print 'creating a list with %d random elements - this can take a while... ' % size,
    sys.stdout.flush()
    lst = create_list(size)
    print 'done'
    t0 = time.time()
    lst.sort()
    t1 = time.time()
    state['time'] = t1 - t0

if __name__ == '__main__':
    manager = multiprocessing.Manager()
    state = manager.dict(list_size=5*1000*1000)  # shared state
    p = multiprocessing.Process(target=run_test, args=(state,))
    p.start()
    p.join()
    print 'time to sort: %.3f' % state['time']
    print 'my PID is %d, sleeping for a minute...' % os.getpid()
    time.sleep(60)
    # at this point you can inspect the running process to see that it
    # does not consume excess memory

奖励答案

奖金问题很难回答,因为问题不清楚.空闲列表概念"正是这样一个概念,一种需要在常规 Python 分配器之上显式编码的实现策略.大多数 Python 类型使用这种分配策略,例如它不用于使用 class 语句创建的类的实例.实现一个空闲列表并不难,但它相当先进,很少没有充分理由就进行.如果某个扩展作者已经选择为其类型之一使用空闲列表,则可以预期他们会意识到空闲列表提供的权衡——以牺牲为代价获得超快的分配/解除分配一些额外的空间(用于空闲列表上的对象和空闲列表本身)并且无法将内存重用于其他用途.

Bonus Answer

It is hard to provide an answer to the bonus question, since the question is unclear. The "free list concept" is exactly that, a concept, an implementation strategy that needs to be explicitly coded on top of the regular Python allocator. Most Python types do not use that allocation strategy, for example it is not used for instances of classes created with the class statement. Implementing a free list is not hard, but it is fairly advanced and rarely undertaken without good reason. If some extension author has chosen to use a free list for one of its types, it can be expected that they are aware of the tradeoff a free list offers — gaining extra-fast allocation/deallocation at the cost of some additional space (for the objects on the free list and the free list itself) and inability to reuse the memory for something else.

这篇关于如何使用子进程强制 python 释放内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆