为什么在"__main__"中导入模块不允许多进程使用模块? [英] Why does importing module in '__main__' not allow multiprocessig to use module?

查看:128
本文介绍了为什么在"__main__"中导入模块不允许多进程使用模块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经通过将导入移动到顶部声明来解决了我的问题,但是这让我感到困惑:为什么我不能在multiprocessing的目标函数中使用在'__main__'中导入的模块? /p>

例如:

import os
import multiprocessing as mp

def run(in_file, out_dir, out_q):
    arcpy.RaterToPolygon_conversion(in_file, out_dir, "NO_SIMPIFY", "Value")
    status = str("Done with "+os.path.basename(in_file))
    out_q.put(status, block=False)

if __name__ == '__main__':
    raw_input("Program may hang, press Enter to import ArcPy...")
    import arcpy

    q = mp.Queue()
    _file = path/to/file
    _dir = path/to/dir
    # There are actually lots of files in a loop to build
    # processes but I just do one for context here
    p = mp.Process(target=run, args=(_file, _dir, q))
    p.start()

# I do stuff with Queue below to status user

当您在IDLE中运行此命令时,它根本不会出错...只是继续进行Queue检查(这很好,所以不是问题).问题是,当您在CMD终端(操作系统或Python)中运行此命令时,会产生未定义arcpy的错误!

只是一个有趣的话题.

解决方案

在类似Unix的系统和Windows中,情况有所不同.在unixy系统上,multiprocessing使用fork创建共享父存储空间的写时复制视图的子进程.子级会看到父级的导入,包括父级在if __name__ == "__main__":下导入的所有内容.

在Windows上,没有fork,必须执行一个新进程.但是简单地重新运行父进程是行不通的-它会再次运行整个程序.相反,multiprocessing运行其自己的python程序,该程序导入父主脚本,然后对希望对子进程足够的父对象空间视图进行腌制/解开.

该程序是子进程的__main__,父脚本的__main__无法运行.就像其他模块一样,导入了主脚本.原因很简单:运行父级__main__只会再次运行完整的父级程序,而mp必须避免.

这里是一个测试,以显示发生了什么事情.一个名为testmp.py的主模块,以及一个由第一个模块导入的第二个模块test2.py.

testmp.py

import os
import multiprocessing as mp

print("importing test2")
import test2

def worker():
    print('worker pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))

if __name__ == "__main__":
    print('main pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))
    print("running process")
    proc = mp.Process(target=worker)
    proc.start()
    proc.join()

test2.py

import os

print('test2 pid: {}, module name: {}, file name: {}'.format(os.getpid(),
        __name__, __file__))

在Linux上运行时,一次导入test2,并且worker在主模块中运行.

importing test2
test2 pid: 17840, module name: test2, file name: /media/td/USB20FD/tmp/test2.py
main pid: 17840, module name: __main__, file name: testmp.py
running process
worker pid: 17841, module name: __main__, file name: testmp.py

在Windows下,请注意两次导入test2"被打印-testmp.py运行了两次.但是"main pid"仅被打印一次-它的__main__没有运行.这是因为multiprocessing在导入过程中将模块名称更改为__mp_main__.

E:\tmp>py testmp.py
importing test2
test2 pid: 7536, module name: test2, file name: E:\tmp\test2.py
main pid: 7536, module name: __main__, file name: testmp.py
running process
importing test2
test2 pid: 7544, module name: test2, file name: E:\tmp\test2.py
worker pid: 7544, module name: __mp_main__, file name: E:\tmp\testmp.py

I've already solved my problem by moving the import to the top declarations, but it left me wondering: Why cant I use a module that was imported in '__main__' in functions that are the targets of multiprocessing?

For example:

import os
import multiprocessing as mp

def run(in_file, out_dir, out_q):
    arcpy.RaterToPolygon_conversion(in_file, out_dir, "NO_SIMPIFY", "Value")
    status = str("Done with "+os.path.basename(in_file))
    out_q.put(status, block=False)

if __name__ == '__main__':
    raw_input("Program may hang, press Enter to import ArcPy...")
    import arcpy

    q = mp.Queue()
    _file = path/to/file
    _dir = path/to/dir
    # There are actually lots of files in a loop to build
    # processes but I just do one for context here
    p = mp.Process(target=run, args=(_file, _dir, q))
    p.start()

# I do stuff with Queue below to status user

When you run this in IDLE it doesn't error at all...just keeps doing a Queue check (which is good so not the problem). The problem is that when you run this in the CMD terminal (either OS or Python) it produces the error that arcpy is not defined!

Just a curious topic.

解决方案

The situation is different in unix-like systems and Windows. On the unixy systems, multiprocessing uses fork to create child processes that share a copy-on-write view of the parent memory space. The child sees the imports from the parent, including anything the parent imported under if __name__ == "__main__":.

On windows, there is no fork, a new process has to be executed. But simply rerunning the parent process doesn't work - it would run the whole program again. Instead, multiprocessing runs its own python program that imports the parent main script and then pickles/unpickles a view of the parent object space that is, hopefully, sufficient for the child process.

That program is the __main__ for the child process and the __main__ of the parent script doesn't run. The main script was just imported like any other module. The reason is simple: running the parent __main__ would just run the full parent program again, which mp must avoid.

Here is a test to show what is going on. A main module called testmp.py and a second module test2.py that is imported by the first.

testmp.py

import os
import multiprocessing as mp

print("importing test2")
import test2

def worker():
    print('worker pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))

if __name__ == "__main__":
    print('main pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))
    print("running process")
    proc = mp.Process(target=worker)
    proc.start()
    proc.join()

test2.py

import os

print('test2 pid: {}, module name: {}, file name: {}'.format(os.getpid(),
        __name__, __file__))

When run on Linux, test2 is imported once and the worker runs in the main module.

importing test2
test2 pid: 17840, module name: test2, file name: /media/td/USB20FD/tmp/test2.py
main pid: 17840, module name: __main__, file name: testmp.py
running process
worker pid: 17841, module name: __main__, file name: testmp.py

Under windows, notice that "importing test2" is printed twice - testmp.py was run two times. But "main pid" was only printed once - its __main__ wasn't run. That's because multiprocessing changed the module name to __mp_main__ during import.

E:\tmp>py testmp.py
importing test2
test2 pid: 7536, module name: test2, file name: E:\tmp\test2.py
main pid: 7536, module name: __main__, file name: testmp.py
running process
importing test2
test2 pid: 7544, module name: test2, file name: E:\tmp\test2.py
worker pid: 7544, module name: __mp_main__, file name: E:\tmp\testmp.py

这篇关于为什么在"__main__"中导入模块不允许多进程使用模块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆