带有多个参数的python多重处理 [英] python multiprocessing with multiple arguments
问题描述
我正在尝试对一个大文件执行多项操作的函数,但是我却使用partial
却遇到了已知的pickling
错误.
I'm trying to multiprocess a function that does multiple actions for a large file but I'm getting the knownle pickling
error eventhough Im using partial
.
该函数看起来像这样:
def process(r,intermediate_file,record_dict,record_id):
res=0
record_str = str(record_dict[record_id]).upper()
start = record_str[0:100]
end= record_str[len(record_seq)-100:len(record_seq)]
print sample, record_id
if r=="1":
if something:
res = something...
intermediate_file.write("...")
if something:
res = something
intermediate_file.write("...")
if r == "2":
if something:
res = something...
intermediate_file.write("...")
if something:
res = something
intermediate_file.write("...")
return res
另一个函数中的即时通讯方式如下:
The way im calling it is the following in another function:
def call_func():
intermediate_file = open("inter.txt","w")
record_dict = get_record_dict() ### get infos about each record as a dict based on the record_id
results_dict = {}
pool = Pool(10)
for a in ["a","b","c",...]:
if not results_dict.has_key(a):
results_dict[a] = {}
for b in ["1","2","3",...]:
if not results_dict[a].has_key(b):
results_dict[a][b] = {}
results_dict[a][b]['res'] = []
infile = open(a+b+".txt","r")
...parse the file and return values in a list called "record_ids"...
### now call the function based on for each record_id in record_ids
if b=="1":
func = partial(process,"1",intermediate_file,record_dict)
res=pool.map(func, record_ids)
## append the results for each pair (a,b) for EACH RECORD in the results_dict
results_dict[a][b]['res'].append(res)
if b=="2":
func = partial(process,"2",intermediate_file,record_dict)
res = pool.map(func, record_ids)
## append the results for each pair (a,b) for EACH RECORD in the results_dict
results_dict[a][b]['res'].append(res)
... do something with results_dict...
这个想法是,对于record_ids中的每个记录,我想保存每对(a,b)的结果.
The idea is that for each record inside the record_ids, I want to save the results for each pair (a,b).
我不确定是什么导致了此错误:
I'm not sure what is giving me this error:
File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 558, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function faile
d
推荐答案
func
不在代码的顶层定义,因此不能被腌制.
您可以使用pathos.multiprocesssing
,它不是标准模块,但可以使用.
func
is not defined at the top level of the code so it can't be pickled.
You can use pathos.multiprocesssing
which is not a standard module but it will work.
或者,使用与Pool.map
不同的东西,也许是工人队列"?
https://docs.python.org/2/library/queue.html
Or, use something diferent to Pool.map
maybe a Queue of workers ?
https://docs.python.org/2/library/queue.html
最后有一个您可以使用的示例,它用于threading
,但与multiprocessing
非常相似,在该示例中也有Queues ...
In the end there is an example you can use, it's for threading
but is very similar to the multiprocessing
where there is also Queues...
https://docs.python.org/2 /library/multiprocessing.html#pipes-and-queues
这篇关于带有多个参数的python多重处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!