在Linux系统上检查python多重处理中的fork行为 [英] Checking fork behaviour in python multiprocessing on Linux systems

查看:80
本文介绍了在Linux系统上检查python多重处理中的fork行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须从许多进程中访问一组大型且无法拾取的python对象.因此,我想确保不会完全复制这些对象.

I have to access a set of large and not pickable python objects from many processes. Therefore, I would like to ensure that these objects are not copied completely.

根据此中的评论帖子,除非复制对象(在Unix系统上),否则他们被改变了.但是,引用对象将更改其引用计数,然后将其复制.

According to comments in this and this post, objects are not copied (on unix systems) unless they are changed. However, referencing an object will change its reference count, which in turn will then be copied.

到目前为止,这是正确的吗?由于我担心的是我的大对象的大小,因此如果复制这些对象的一小部分,我没有问题.

Is this correct so far? Since my concern is due to the size of my large objects, I do not have a problem, if small parts of these objects are copied.

为了确保我正确理解了所有内容,并且没有发生意外情况,我实施了一个小型测试程序:

To ensure that I understood everything correctly and that nothing unexpected happens, I implemented a small test program:

from multiprocessing import Pool

def f(arg):
    print(l, id(l), object.__repr__(l))
    l[arg] = -1
    print(l, id(l), object.__repr__(l))

def test(n):
    global l
    l = list(range(n))
    with Pool() as pool: 
        pool.map(f, range(n))
    print(l, id(l), object.__repr__(l))

if __name__ == '__main__':
    test(5) 

f的第一行中,我希望id(l)在所有函数调用中返回相同的数字,因为在id检查之前列表不会更改.

In the first line of f, I would expect id(l) to return the same number in all function calls, since the list is not changed before the id check.

另一方面,在f的第三行中,id(l)在每个方法调用中应返回不同的数字,因为列表在第二行中已更改.

On the other hand, in the third line of f, id(l) should return a different number in each method call, since the list is changed in the second line.

但是,程序输出使我感到困惑.

However, the program output puzzles me.

[0, 1, 2, 3, 4] 139778408436488 <list object at 0x7f20b261d308>
[-1, 1, 2, 3, 4] 139778408436488 <list object at 0x7f20b261d308>
[0, 1, 2, 3, 4] 139778408436488 <list object at 0x7f20b261d308>
[0, -1, 2, 3, 4] 139778408436488 <list object at 0x7f20b261d308>
[0, 1, 2, 3, 4] 139778408436488 <list object at 0x7f20b261d308>
[0, 1, -1, 3, 4] 139778408436488 <list object at 0x7f20b261d308>
[0, 1, 2, 3, 4] 139778408436488 <list object at 0x7f20b261d308>
[0, 1, 2, -1, 4] 139778408436488 <list object at 0x7f20b261d308>
[0, 1, 2, 3, 4] 139778408436488 <list object at 0x7f20b261d308>
[0, 1, 2, 3, -1] 139778408436488 <list object at 0x7f20b261d308>
[0, 1, 2, 3, 4] 139778408436488

f的所有呼叫和线路中的ID均相同.即使列表在末尾保持不变(按预期),情况仍然如此,这意味着列表已被复制.

The id is the same in all calls and lines of f. This is the case even though the list remains unchanged at the end (as expected), which implies that the list has been copied.

如何查看对象是否已被复制?

How can I see whether an object has been copied or not?

推荐答案

您的困惑似乎是由于误解了流程和fork的工作方式而引起的.每个进程都有自己的地址空间,因此两个进程可以使用相同的地址而不会发生冲突.这也意味着一个进程无法访问另一个进程的内存,除非将同一内存映射到两个进程中.

Your confusion seems to be cause by misunderstanding how processes and fork work. Each process has its own address space and so two processes can use the same addresses without conflict. This also means a process can't access the memory of another process unless the same memory is mapped into both processes.

当进程调用fork系统调用时,操作系统将创建一个新的子进程,该子进程是父进程的克隆.像任何其他进程一样,此克隆具有自己的地址空间,与父进程不同.但是,地址空间的内容是父级地址的精确副本.过去,这是通过将父进程的内存复制到为子进程分配的新内存中来实现的.这意味着一旦子级和父级在fork之后恢复执行,则任何一个进程对自己的内存进行的任何修改均不会影响其他进程.

When a process invokes the fork system call, the operating system creates a new child process that's a clone of the parent process. This clone, like any other process, has it's own address space distinct from its parent. However the contents of the address space are an exact copy of the parent's. This used to be accomplished by copying the memory of the parent process into new memory allocated for the child. This means once the child and parent resume executing after the fork any modifications either process makes to their own memory doesn't affect the other.

但是,复制进程的整个地址空间是一项昂贵的操作,并且通常是浪费的.大多数情况下,新进程会立即执行新程序,从而导致孩子的地址空间被完全替换.因此,相反,现代的类Unix操作系统使用写时复制" fork实现.与其复制父进程的内存,不如将父进程的内存映射到子进程,以便它们可以共享相同的内存.但是,仍然保留了旧的语义.如果子级或父级修改共享内存,则将复制修改后的页面,以便两个进程不再共享该内存页面.

However, copying the entire address space of a process is an expensive operation, and is usually a waste. Most of the time the new process immediately executes a new program which results in the child's address space being replaced completely. So instead modern Unix-like operating systems use a "copy-on-write" fork implementation. Instead of copying the memory of the parent process the parent's memory is mapped into the child so they can share the same memory. However, the old semantics are still maintained. If either the child or the parent modify the shared memory then the page modified is copied so that the two processes no longer share that page of memory.

multiprocessing模块调用您的f函数时,它会在使用fork系统调用创建的子进程中执行此操作.由于此子进程是父进程的克隆,因此它还具有名为l的全局变量,该变量引用在两个进程中具有相同ID(地址)和相同内容的列表.也就是说,直到您在子进程中修改l引用的列表为止. ID不会(也不能更改),但是列表的子版本不再与父版本相同.父母名单的内容不影响孩子所做的修改.

When the multiprocessing module calls your f function it does so in a child process that was created by using the fork system call. Since this child process is a clone of the parent, it also has a global variable named l which refers to a list which has the same ID (address) and same contents in both processes. That is, until you modify the list referred by l in the child process. The ID doesn't (and can't) change, but child's version of the list is no longer the same as the parent's. The contents of the parent's list are unaffected the modification made by the child.

请注意,无论fork是否使用写时复制,上一段中描述的行为都是正确的.就multiprocessing模块和一般而言的Python而言,这只是一个实现细节.无论如何,有效结果都是相同的.这意味着您不能真正在使用fork实现的Python程序中进行测试.

Note that behaviour described in previous paragraph is true whether fork uses copy-on-write or not. As far as the multiprocessing module and Python in general are concerned that's just an implementation detail. The effective result is the same regardless. This mean you can't really test in a Python program which fork implementation is used.

这篇关于在Linux系统上检查python多重处理中的fork行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆