Python current.futures多次导入库(多次在顶级范围内执行代码) [英] Python current.futures import libraries multiple times (execute code in top scope multiple times)

查看:133
本文介绍了Python current.futures多次导入库(多次在顶级范围内执行代码)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于以下脚本(python 3.6,windows anaconda),我注意到导入的库与调用的处理器数量一样多。并且 print('Hello')也执行多次相同的次数。

for the following script (python 3.6, windows anaconda), I noticed that the libraries are imported as many as the number of the processors were invoked. And print('Hello') are also executed multiple same amount of times.

我认为处理器只会在 func1 调用中被调用,而不是整个程序。实际的 func1 是一项繁重的CPU限制任务,将执行数百万次。

I thought the processors will only be invoked for func1 call rather than the whole program. The actual func1 is a heavy cpu bounded task which will be executed for millions of times.

这是执行此类任务的正确框架吗?

Is this the right choice of framework for such task?

import pandas as pd
import numpy as np
from concurrent.futures import ProcessPoolExecutor

print("Hello")

def func1(x):
    return x


if __name__ == '__main__':
    print(datetime.datetime.now())    
    print('test start')

    with ProcessPoolExecutor() as executor:
        results = executor.map(func1, np.arange(1,1000))
        for r in results:
            print(r)

    print('test end')
    print(datetime.datetime.now())


推荐答案

concurrent.futures.ProcessPoolExecutor 使用多重处理模块进行其多重处理。

concurrent.futures.ProcessPoolExecutor uses the multiprocessing module to do its multiprocessing.

然后,如编程指南,这意味着您必须保护不想在 __ main __ 块:

And, as explained in the Programming guidelines, this means you have to protect any top-level code you don't want to run in every process in your __main__ block:


请确保主模块可以由新的Python解释器安全地导入,而不会引起意外的副作用(例如,启动新进程) )。

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

...如果__name__ =='__main__',则应该使用保护程序的入口点:

... one should protect the "entry point" of the program by using if __name__ == '__main__':

请注意,只有在使用 spawn 时才需要这样做forkserver 启动方法。但是,如果您使用的是Windows,则默认为 spawn 。而且,无论如何,这样做绝对不会 ,而且通常会使代码更清晰,因此还是值得这样做。

Notice that this is only necessary if using the spawn or forkserver start methods. But if you're on Windows, spawn is the default. And, at any rate, it never hurts to do this, and usually makes the code clearer, so it's worth doing anyway.

您可能 don 't 希望以此方式保护您的进口。毕竟,每个内核一次调用 import pandas作为pd 的开销似乎微不足道,但这仅在启动时发生,而运行繁重的CPU绑定功能的开销则高达数百万美元。有时会完全淹没它。 (否则,您可能一开始就不想使用多重处理…)通常,对于 def class 语句(尤其是如果它们未捕获任何闭包变量或其他任何内容)。只是多次运行不正确的设置代码(例如您示例中的 print('hello'))需要保护。

You probably don't want to protect your imports this way. After all, the cost of calling import pandas as pd once per core may seem nontrivial, but that only happens at startup, and the cost of running a heavy CPU-bound function millions of times will completely swamp it. (If not, you probably didn't want to use multiprocessing in the first place…) And usually, the same goes for your def and class statements (especially if they're not capturing any closure variables or anything). It's only setup code that's incorrect to run multiple times (like that print('hello') in your example) that needs to be protected.

concurrent.futures 文档(以及 PEP 3148 )均使用主函数习惯用语:

The examples in the concurrent.futures doc (and in PEP 3148) all handle this by using the "main function" idiom:

def main():
    # all of your top-level code goes here

if __name__ == '__main__':
    main()

这具有将您的顶级商品变成其他商品的好处将全局变量转换为本地变量,以确保您不会意外地共享它们(尤其是 multiprocessing 会出现问题,因为它们实际上与 fork共享,但使用 spawn 复制,因此相同的代码在一个平台上进行测试时可能会起作用,但在另一平台上进行部署时会失败)。

This has the added benefit of turning your top-level globals into locals, to make sure you don't accidentally share them (which can especially be a problem with multiprocessing, where they get actually shared with fork, but copied with spawn, so the same code may work when testing on one platform, but then fail when deployed on the other).

如果您想知道为什么 ,会发生这种情况:

If you want to know why this happens:

使用 fork 启动方法, multiprocessing 通过克隆父Python解释器然后在正确的位置启动池服务功能来创建每个新的子进程。您(或 current.futures )创建了池。因此,顶级代码不会重新运行。

With the fork start method, multiprocessing creates each new child process by cloning the parent Python interpreter and then just starting the pool-servicing function up right where you (or concurrent.futures) created the pool. So, top-level code doesn't get re-run.

使用 spawn 启动方法, multiprocessing 通过启动一个干净的新Python解释器, import 创建您的代码,然后启动池服务功能来创建每个新的子进程。因此,顶级代码将作为导入的一部分重新运行。

With the spawn start method, multiprocessing creates each new child process by starting a clean new Python interpreter, importing your code, and then starting the pool-servicing function. So, top-level code gets re-run as part of the import.

这篇关于Python current.futures多次导入库(多次在顶级范围内执行代码)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆