Python中的线程和多处理os模块 [英] Threading and Multiprocessing os modules in Python

查看:89
本文介绍了Python中的线程和多处理os模块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建多个文件,这些文件将使用独立程序进行分析,这是用python编写的高吞吐量分析的一部分.

for foo in X:
    write foo_file
    os.system(run_program foo_file)

对于15,000个不同的单个文件,如果我可以在多个内核上运行它们,则运行速度会更快,但是我不想淹没我的服务器.如何设置要在os中运行的多个线程,但要设置一次打开的最大线程数?我不担心生成进程的速度,因为运行时是由我的领域中的外部程序标准定义的.

我看过有关线程和多处理的文档,但不知所措.

解决方案

一种限制产生的进程总数的简单方法是使用

哪个输出应该类似于:

[mike@tester ~]$ python test.py 
Starting process #0
Starting process #2
Starting process #3
Starting process #1
Starting process #4
Starting process #5
Starting process #6
Starting process #7
Starting process #8
Starting process #9
Finished process #2 which delayed for 1s.
Starting process #10
Finished process #7 which delayed for 1s.
Finished process #6 which delayed for 1s.
Starting process #11
Starting process #12
Finished process #9 which delayed for 2s.
Finished process #12 which delayed for 1s.
Starting process #13
Starting process #14
Finished process #1 which delayed for 3s.
Finished process #5 which delayed for 3s.
Starting process #15
Starting process #16
Finished process #8 which delayed for 3s.
Starting process #17
Finished process #4 which delayed for 4s.
Starting process #18
Finished process #10 which delayed for 3s.
Finished process #13 which delayed for 2s.
Starting process #19
Finished process #0 which delayed for 5s.
Finished process #3 which delayed for 5s.
Finished process #11 which delayed for 4s.
Finished process #15 which delayed for 2s.
Finished process #16 which delayed for 2s.
Finished process #18 which delayed for 2s.
Finished process #14 which delayed for 4s.
Finished process #17 which delayed for 5s.
Finished process #19 which delayed for 5s.

如您所见,前十个进程开始,然后每个后续进程仅在另一个进程池工作程序完成后才开始(成为可用).使用多个进程(而不是多个线程)绕过全局解释器锁(GIL). >

要使此示例代码与您的任务配合使用,您需要编写文件输出函数并将其传递,并将文件数据的可迭代性传递给pool.map(),以代替writeOutrange(20).

I'm trying to create multiple files which will be analyzed with a stand-alone program as part of a high throughput analysis written in python.

for foo in X:
    write foo_file
    os.system(run_program foo_file)

For 15,000 distinct individual files this will run faster if I can run them on multiple cores, but I don't want to swamp my server. How do I set up multiple threads for to run in os but put a maximum on the number of threads open at once? I'm not worried about speed of spawning processes as the runtime is defined by an external program standard in my field.

I've looked at documentation for threading and multiprocessing and gotten overwhelmed.

解决方案

An easy way to limit the total number of processes spawned is to use a multiprocessing pool.

A simple example demonstrating a multiprocessing pool is:

test.py

from multiprocessing.pool import Pool
# @NOTE: The two imports below are for demo purposes and won't be necessary in
# your final program
import random
import time

def writeOut(index):
    """ A function which prints a start message, delays for a random interval and then
        prints a finish message
    """
    delay = random.randint(1,5)                                                                                                                                             
    print("Starting process #{0}".format(index))
    time.sleep(delay)
    print("Finished process #{0} which delayed for {1}s.".format(index, delay))

# Create a process pool with a maximum of 10 worker processes
pool = Pool(processes=10)
# Map our function to a data set - number 1 through 20
pool.map(writeOut, range(20))

Which should give you an output similar to:

[mike@tester ~]$ python test.py 
Starting process #0
Starting process #2
Starting process #3
Starting process #1
Starting process #4
Starting process #5
Starting process #6
Starting process #7
Starting process #8
Starting process #9
Finished process #2 which delayed for 1s.
Starting process #10
Finished process #7 which delayed for 1s.
Finished process #6 which delayed for 1s.
Starting process #11
Starting process #12
Finished process #9 which delayed for 2s.
Finished process #12 which delayed for 1s.
Starting process #13
Starting process #14
Finished process #1 which delayed for 3s.
Finished process #5 which delayed for 3s.
Starting process #15
Starting process #16
Finished process #8 which delayed for 3s.
Starting process #17
Finished process #4 which delayed for 4s.
Starting process #18
Finished process #10 which delayed for 3s.
Finished process #13 which delayed for 2s.
Starting process #19
Finished process #0 which delayed for 5s.
Finished process #3 which delayed for 5s.
Finished process #11 which delayed for 4s.
Finished process #15 which delayed for 2s.
Finished process #16 which delayed for 2s.
Finished process #18 which delayed for 2s.
Finished process #14 which delayed for 4s.
Finished process #17 which delayed for 5s.
Finished process #19 which delayed for 5s.

As you can see the first ten processes kick off and then each subsequent process only starts as soon as another process pool worker is done (becomes available). Using multiple processes (as opposed to multiple threads) bypasses the global interpreter lock (GIL).

To get this example code to work with your task you'll need to write a file output function and pass it and the iterable of file data to write to pool.map() in place of writeOut and range(20).

这篇关于Python中的线程和多处理os模块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆