Python多处理问题? [英] Python Multi-Processing Question?
问题描述
我有一个包含 500 个输入文件的文件夹(所有文件的总大小约为 500[MB]).
I have a folder with 500 input files (total size of all files is ~ 500[MB]).
我想编写一个执行以下操作的 python
脚本:
I'd like to write a python
script that does the following:
(1)将所有输入文件加载到内存中
(1) load all of the input files to memory
(2) 初始化一个空的 python
列表,稍后将使用该列表......参见项目符号 (4)
(2) initializes an empty python
list that will later be used ... see bullet (4)
(3) 启动 15 个不同的(独立的)进程:每个进程都使用相同的输入数据 [来自 (1)
] -- 但使用不同的算法来处理它,从而产生不同的结果
(3) start 15 different (independent) processes: each of these uses the same input data [from (1)
] -- yet uses a different algorithms to processes it, thus generating different results
(4) 我希望所有独立进程[从步骤 (3)
] 将它们的输出存储在同一个 python
list [在步骤 (2)
中初始化的同一个列表]
(4) I'd like all the independent processes [from step (3)
] to store their output in the same python
list [same list that was initialized in step (2)
]
一旦所有 15 个进程都完成了它们的运行,我将拥有一个 python 列表
,其中包含所有 15 个独立进程的结果.
Once all 15 processes have completed their run, I will have one python list
that includes the results of all the 15 independent processes.
我的问题是,是否可以在 python
中有效地执行上述操作?如果是这样,您能否提供说明如何执行此操作的方案/示例代码?
My question is, is it possible to do the above efficiently in python
? if so, can you provide a scheme / sample code that illustrates how to do so?
注意 #1:我将在一个强大的多核服务器上运行它;所以这里的目标是使用所有的处理能力,同时在所有独立进程之间共享一些内存{input data
, output list
}.
Note #1: I will be running this on a strong, multi-core server; so the goal here is to use all the processing power while sharing some memory {input data
, output list
} among all the independent processes.
注意 #2:我在 Linux
环境中工作
推荐答案
ok 我刚刚使用 zeromq 解决了这个问题 向多个发布者展示单个订阅者.您可能可以对队列执行相同的操作,但您需要对它们进行更多管理.zeromq 套接字可以正常工作,这对于 IMO 之类的事情来说非常有用.
ok I just whipped this up using zeromq to demonstrate a single subscriber to multiple publishers. You could probably do the same with queues but you would need to manage them a bit more. zeromq sockets just work which makes it nice for things like this IMO.
"""
demo of multiple processes doing processing and publishing the results
to a common subscriber
"""
from multiprocessing import Process
class Worker(Process):
def __init__(self, filename, bind):
self._filename = filename
self._bind = bind
super(Worker, self).__init__()
def run(self):
import zmq
import time
ctx = zmq.Context()
result_publisher = ctx.socket(zmq.PUB)
result_publisher.bind(self._bind)
time.sleep(1)
with open(self._filename) as my_input:
for l in my_input.readlines():
result_publisher.send(l)
if __name__ == '__main__':
import sys
import os
import zmq
#assume every argument but the first is a file to be processed
files = sys.argv[1:]
# create a worker for each file to be processed if it exists pass
# in a bind argument instructing the socket to communicate via ipc
workers = [Worker(f, "ipc://%s_%s" % (f, i)) for i, f \
in enumerate((x for x in files if os.path.exists(x)))]
# create subscriber socket
ctx = zmq.Context()
result_subscriber = ctx.socket(zmq.SUB)
result_subscriber.setsockopt(zmq.SUBSCRIBE, "")
# wire up subscriber to whatever the worker is bound to
for w in workers:
print w._bind
result_subscriber.connect(w._bind)
# start workers
for w in workers:
print "starting workers..."
w.start()
result = []
# read from the subscriber and add it to the result list as long
# as at least one worker is alive
while [w for w in workers if w.is_alive()]:
result.append(result_subscriber.recv())
else:
# output the result
print result
哦,只是为了得到 zmq
oh and to get zmq just
$ pip install pyzmq-static
这篇关于Python多处理问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!