Python多处理内存使用 [英] Python multiprocessing memory usage

查看：22 发布时间：2022/1/12 12:24:09 python linux memory-management multiprocessing

本文介绍了Python多处理内存使用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写了一个程序，可以总结如下:

I have writen a program that can be summarized as follows:

def loadHugeData():
    #load it
    return data

def processHugeData(data, res_queue):
    for item in data:
        #process it
        res_queue.put(result)
    res_queue.put("END")

def writeOutput(outFile, res_queue):
    with open(outFile, 'w') as f
        res=res_queue.get()
        while res!='END':
            f.write(res)
            res=res_queue.get()

res_queue = multiprocessing.Queue()

if __name__ == '__main__':
    data=loadHugeData()
    p = multiprocessing.Process(target=writeOutput, args=(outFile, res_queue))
    p.start()
    processHugeData(data, res_queue)
    p.join()

真正的代码(尤其是writeOutput())要复杂得多.writeOutput() 仅使用它作为参数的这些值(意味着它不引用 data)

The real code (especially writeOutput()) is a lot more complicated. writeOutput() only uses these values that it takes as its arguments (meaning it does not reference data)

基本上，它将一个巨大的数据集加载到内存中并对其进行处理.输出的写入被委托给一个子进程(它实际上写入多个文件，这需要很多时间).因此，每次处理一个数据项时，它都会通过 res_queue 发送到子进程，然后再根据需要将结果写入文件.

Basically it loads a huge dataset into memory and processes it. Writing of the output is delegated to a sub-process (it writes into multiple files actually and this takes a lot of time). So each time one data item gets processed it is sent to the sub-process trough res_queue which in turn writes the result into files as needed.

子进程不需要以任何方式访问、读取或修改loadHugeData()加载的数据.子进程只需要使用主进程通过res_queue发送的内容.这引出了我的问题和疑问.

The sub-process does not need to access, read or modify the data loaded by loadHugeData() in any way. The sub-process only needs to use what the main process sends it trough res_queue. And this leads me to my problem and question.

在我看来，子进程拥有自己的庞大数据集副本(使用 top 检查内存使用情况时).这是真的?如果是这样，那么我怎样才能避免 id (本质上使用双内存)?

It seems to me that the sub-process gets its own copy of the huge dataset (when checking memory usage with top). Is this true? And if so then how can i avoid id (using double memory essentially)?

我使用的是 Python 2.6，程序在 linux 上运行.

I am using Python 2.6 and program is running on linux.

Python多处理内存使用 [英] Python multiprocessing memory usage

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

Python多处理内存使用 [英] Python multiprocessing memory usage

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭