带有多个参数的python多重处理 [英] python multiprocessing with multiple arguments

查看:69
本文介绍了带有多个参数的python多重处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对一个大文件执行多项操作的函数,但是我却使用partial却遇到了已知的pickling错误.

I'm trying to multiprocess a function that does multiple actions for a large file but I'm getting the knownle pickling error eventhough Im using partial.

该函数看起来像这样:

def process(r,intermediate_file,record_dict,record_id):

    res=0

    record_str = str(record_dict[record_id]).upper()
    start = record_str[0:100]
    end= record_str[len(record_seq)-100:len(record_seq)]

    print sample, record_id
    if r=="1":

        if something:
            res = something...
            intermediate_file.write("...")

        if something:
            res = something
            intermediate_file.write("...")



    if r == "2":
        if something:
            res = something...
            intermediate_file.write("...")

        if something:
            res = something
            intermediate_file.write("...")

    return res

另一个函数中的即时通讯方式如下:

The way im calling it is the following in another function:

def call_func():
    intermediate_file = open("inter.txt","w")
    record_dict = get_record_dict()                 ### get infos about each record as a dict based on the record_id
    results_dict = {}  
    pool = Pool(10)
    for a in ["a","b","c",...]:

        if not results_dict.has_key(a):
            results_dict[a] = {}

        for b in ["1","2","3",...]:

            if not results_dict[a].has_key(b):
                results_dict[a][b] = {}


            results_dict[a][b]['res'] = []

            infile = open(a+b+".txt","r")
            ...parse the file and return values in a list called "record_ids"...

            ### now call the function based on for each record_id in record_ids
            if b=="1":
                func = partial(process,"1",intermediate_file,record_dict)
                res=pool.map(func, record_ids)
                ## append the results for each pair (a,b) for EACH RECORD in the results_dict 
                results_dict[a][b]['res'].append(res)

            if b=="2":
                func = partial(process,"2",intermediate_file,record_dict)
                res = pool.map(func, record_ids)
                ## append the results for each pair (a,b) for EACH RECORD in the results_dict
                results_dict[a][b]['res'].append(res) 

    ... do something with results_dict...

这个想法是,对于record_ids中的每个记录,我想保存每对(a,b)的结果.

The idea is that for each record inside the record_ids, I want to save the results for each pair (a,b).

我不确定是什么导致了此错误:

I'm not sure what is giving me this error:

  File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 558, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function faile

d

推荐答案

func不在代码的顶层定义,因此不能被腌制. 您可以使用pathos.multiprocesssing,它不是标准模块,但可以使用.

func is not defined at the top level of the code so it can't be pickled. You can use pathos.multiprocesssing which is not a standard module but it will work.

或者,使用与Pool.map不同的东西,也许是工人队列"? https://docs.python.org/2/library/queue.html

Or, use something diferent to Pool.map maybe a Queue of workers ? https://docs.python.org/2/library/queue.html

最后有一个您可以使用的示例,它用于threading,但与multiprocessing非常相似,在该示例中也有Queues ...

In the end there is an example you can use, it's for threading but is very similar to the multiprocessing where there is also Queues...

https://docs.python.org/2 /library/multiprocessing.html#pipes-and-queues

这篇关于带有多个参数的python多重处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆