Python多处理问题? [英] Python Multi-Processing Question?

查看:35
本文介绍了Python多处理问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 500 个输入文件的文件夹(所有文件的总大小约为 500[MB]).

I have a folder with 500 input files (total size of all files is ~ 500[MB]).

我想编写一个执行以下操作的 python 脚本:

I'd like to write a python script that does the following:

(1)将所有输入文件加载到内存中

(1) load all of the input files to memory

(2) 初始化一个空的 python 列表,稍后将使用该列表......参见项目符号 (4)

(2) initializes an empty python list that will later be used ... see bullet (4)

(3) 启动 15 个不同的(独立的)进程:每个进程都使用相同的输入数据 [来自 (1)] -- 但使用不同的算法来处理它,从而产生不同的结果

(3) start 15 different (independent) processes: each of these uses the same input data [from (1)] -- yet uses a different algorithms to processes it, thus generating different results

(4) 我希望所有独立进程[从步骤 (3)] 将它们的输出存储在同一个 pythonlist [在步骤 (2) 中初始化的同一个列表]

(4) I'd like all the independent processes [from step (3)] to store their output in the same python list [same list that was initialized in step (2)]

一旦所有 15 个进程都完成了它们的运行,我将拥有一个 python 列表,其中包含所有 15 个独立进程的结果.

Once all 15 processes have completed their run, I will have one python list that includes the results of all the 15 independent processes.

我的问题是,是否可以在 python 中有效地执行上述操作?如果是这样,您能否提供说明如何执行此操作的方案/示例代码?

My question is, is it possible to do the above efficiently in python? if so, can you provide a scheme / sample code that illustrates how to do so?

注意 #1:我将在一个强大的多核服务器上运行它;所以这里的目标是使用所有的处理能力,同时在所有独立进程之间共享一些内存{input data, output list}.

Note #1: I will be running this on a strong, multi-core server; so the goal here is to use all the processing power while sharing some memory {input data, output list} among all the independent processes.

注意 #2:我在 Linux 环境中工作

推荐答案

ok 我刚刚使用 zeromq 解决了这个问题 向多个发布者展示单个订阅者.您可能可以对队列执行相同的操作,但您需要对它们进行更多管理.zeromq 套接字可以正常工作,这对于 IMO 之类的事情来说非常有用.

ok I just whipped this up using zeromq to demonstrate a single subscriber to multiple publishers. You could probably do the same with queues but you would need to manage them a bit more. zeromq sockets just work which makes it nice for things like this IMO.

"""
demo of multiple processes doing processing and publishing the results
to a common subscriber
"""
from multiprocessing import Process


class Worker(Process):
    def __init__(self, filename, bind):
        self._filename = filename
        self._bind = bind
        super(Worker, self).__init__()

    def run(self):
        import zmq
        import time
        ctx = zmq.Context()
        result_publisher = ctx.socket(zmq.PUB)
        result_publisher.bind(self._bind)
        time.sleep(1)
        with open(self._filename) as my_input:
            for l in my_input.readlines():
                result_publisher.send(l)

if __name__ == '__main__':
    import sys
    import os
    import zmq

    #assume every argument but the first is a file to be processed
    files = sys.argv[1:]

    # create a worker for each file to be processed if it exists pass
    # in a bind argument instructing the socket to communicate via ipc
    workers = [Worker(f, "ipc://%s_%s" % (f, i)) for i, f \
               in enumerate((x for x in files if os.path.exists(x)))]

    # create subscriber socket
    ctx = zmq.Context()

    result_subscriber = ctx.socket(zmq.SUB)
    result_subscriber.setsockopt(zmq.SUBSCRIBE, "")

    # wire up subscriber to whatever the worker is bound to 
    for w in workers:
        print w._bind
        result_subscriber.connect(w._bind)

    # start workers
    for w in workers:
        print "starting workers..."
        w.start()

    result = []

    # read from the subscriber and add it to the result list as long
    # as at least one worker is alive
    while [w for w in workers if w.is_alive()]:
        result.append(result_subscriber.recv())
    else:
        # output the result
        print result

哦,只是为了得到 zmq

oh and to get zmq just

$ pip install pyzmq-static

这篇关于Python多处理问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆