为什么由多处理模块产生的进程不重复内存? [英] Why do processes spawned by the multiprocessing module not duplicate memory?

查看:98
本文介绍了为什么由多处理模块产生的进程不重复内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对python多重处理的印象是,当您使用multiprocessing.Process()创建新进程时,它将在内存中创建当前程序的完整副本,并从那里继续工作.考虑到这一点,我对以下脚本的行为感到困惑.

My impression with python multiprocessing is that when you create a new process with multiprocessing.Process(), it creates an entire copy of your current program in memory and continues working from there. With that in mind, I'm confused by the behaviour of the following script.

警告:此脚本将分配大量内存!小心运行!

WARNING: This script will allocate a large amount of memory! Run it with caution!

import multiprocessing
import numpy as np
from time import sleep

#Declare a dictionary globally
bigDict = {}

def sharedMemory():
    #Using numpy, store 1GB of random data
    for i in xrange(1000):
        bigDict[i] = np.random.random((125000))
    bigDict[0] = "Known information"

    #In System Monitor, 1GB of memory is being used
    sleep(5)

    #Start 4 processes - each should get a copy of the 1GB dict
    for _ in xrange(4):
        p = multiprocessing.Process(target=workerProcess)
        p.start()

    print "Done"

def workerProcess():
    #Sleep - only 1GB of memory is being used, not the expected 4GB
    sleep(5)

    #Each process has access to the dictionary, even though the memory is shared
    print multiprocessing.current_process().pid,bigDict[0]

if __name__ == "__main__":
    sharedMemory()

上面的程序说明了我的困惑-好像dict在各个进程之间自动变得共享了.我想得到这种行为,我不得不使用多处理管理器.有人可以解释发生了什么事吗?

The above program illustrates my confusion - it seems like the dict automatically becomes shared between the processes. I thought to get that behaviour I had to use a multiprocessing manager. Could someone explain what is going on?

推荐答案

在Linux上,派生进程不会导致立即占用两倍的内存.取而代之的是,新进程的页表将被设置为指向与旧进程相同的物理内存,并且仅当其中一个进程试图对其中一个页面进行写操作时,它们才会被实际复制(复制到写,COW).结果似乎是两个进程都有单独的内存,但是只有在其中一个进程实际接触到内存时才分配物理内存.

On Linux, forking a process doesn't result in twice the memory being occupied immediately. Instead, the page table of the new process will be set up to point to the same physical memory as the old process, and only if one of the processes attempts to do a write to one of the pages, they get actually copied (copy on write, COW). The result is that it appears that both processes have separate memory, but physical memory is only allocated once one of the process actually touches the memory.

这篇关于为什么由多处理模块产生的进程不重复内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆