多处理全局变量内存复制 [英] multiprocessing global variable memory copying

查看：97 发布时间：2020/5/8 19:23:08 python memory multiprocessing

本文介绍了多处理全局变量内存复制的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在运行一个程序，该程序首先将20 GB数据加载到内存中.然后，我将执行N(> 1000)个独立的任务，其中每个任务都可以使用(只读)20 GB数据的一部分.我现在正尝试通过多处理来完成那些任务.但是，正如此答案所述，整个全球为每个过程复制变量.就我而言，我的内存不足以执行4个以上的任务，因为我的内存仅为96 GB.我想知道是否有解决此类问题的方法，以便我可以充分利用我的所有内核而不会占用太多内存.

I am running a program which loads 20 GB data to the memory at first. Then I will do N (> 1000) independent tasks where each of them may use (read only) part of the 20 GB data. I am now trying to do those tasks via multiprocessing. However, as this answer says, the entire global variables are copied for each process. In my case, I do not have enough memory to perform more than 4 tasks as my memory is only 96 GB. I wonder if there is any solution to this kind of problem so that I can fully use all my cores without consuming too much memory.

推荐答案

在linux中，分叉进程具有父地址空间的写时复制视图.分叉是轻量级的，并且相同的程序在父级和子级中都可以运行，只是子级采用不同的执行路径.作为一个小例子，

In linux, forked processes have a copy-on-write view of the parent address space. forking is light-weight and the same program runs in both the parent and the child, except that the child takes a different execution path. As a small exmample,

import os
var = "unchanged"
pid = os.fork()
if pid:
    print('parent:', os.getpid(), var)
    os.waitpid(pid, 0)
else:
    print('child:', os.getpid(), var)
    var = "changed"

# show parent and child views
print(os.getpid(), var)

结果

parent: 22642 unchanged
child: 22643 unchanged
22643 changed
22642 unchanged

将其应用于多处理，在此示例中，我将数据加载到全局变量中.由于python腌制发送到进程池的数据，因此我确保它腌制诸如索引之类的小东西，并让工作人员自己获取全局数据.

Applying this to multiprocessing, in this example I load data into a global variable. Since python pickles the data sent to the process pool, I make sure it pickles something small like an index and have the worker get the global data itself.

import multiprocessing as mp
import os

my_big_data = "well, bigger than this"

def worker(index):
    """get char in big data"""
    return my_big_data[index]

if __name__ == "__main__":
    pool = mp.Pool(os.cpu_count())
    for c in pool.imap_unordered(worker, range(len(my_big_data)), chunksize=1):
        print(c)

Windows没有用于运行程序的fork-exec模式.它必须启动python解释器的新实例，并将所有相关数据克隆到子级.这是一个沉重的负担！

Windows does not have a fork-and-exec model for running programs. It has to start a new instance of the python interpreter and clone all relevant data to the child. This is a heavy lift!

这篇关于多处理全局变量内存复制的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

多处理全局变量内存复制 [英] multiprocessing global variable memory copying

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

多处理全局变量内存复制 [英] multiprocessing global variable memory copying

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭