我该如何避免这种酸洗错误,以及在Python中并行化此代码的最佳方法是什么? [英] How do I avoid this pickling error, and what is the best way to parallelize this code in Python?

查看:91
本文介绍了我该如何避免这种酸洗错误,以及在Python中并行化此代码的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码.

def main():
  (minI, maxI, iStep, minJ, maxJ, jStep, a, b, numProcessors) = sys.argv
  for i in range(minI, maxI, iStep):
    for j in range(minJ, maxJ, jStep): 
      p = multiprocessing.Process(target=functionA, args=(minI, minJ))
      p.start()
      def functionB((a, b)):
        subprocess.call('program1 %s %s %s %s %s %s' %(c, a, b, 'file1', 
          'file2', 'file3'), shell=True)
        for d in ['a', 'b', 'c']:
          subprocess.call('program2 %s %s %s %s %s' %(d, 'file4', 'file5', 
            'file6', 'file7'), shell=True)
      abProduct = list(itertools.product(range(0, 10), range(0, 10)))
      pool = multiprocessing.Pool(processes=numProcessors)
      pool.map(functionB, abProduct) 

它产生以下错误.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 484, in run 
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 255, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function fa
iled

functionA的内容并不重要,并且不会产生错误.当我尝试映射functionB时,似乎发生了错误.如何消除此错误?在python 2.6中并行化此代码的最佳方法是什么?

The contents of functionA are unimportant, and do not produce an error. The error seems to occur when I try to map functionB. How do I remove this error, and what is the best way to parallelize this code in Python 2.6?

推荐答案

您最有可能看到此行为的原因是因为定义池,对象和函数的顺序. multiprocessing与使用线程不太一样.每个进程都将生成并加载环境的副本.如果您在进程可能无法使用的范围内创建函数,或者在池之前创建对象,则该池将失败.

The reason you are most likely seeing this behavior is because of the order in which you define your pool, objects, and functions. multiprocessing is not quite the same as using threads. Each process will spawn and load a copy of the environment. If you create functions in scopes that may not be available to the processes, or create objects before the pool, then the pool will fail.

首先,尝试在大循环之前创建一个池:

First, try creating one pool before your big loop:

(minI, maxI, iStep, minJ, maxJ, jStep, a, b, numProcessors) = sys.argv
pool = multiprocessing.Pool(processes=numProcessors)
for i in range(minI, maxI, iStep):
    ...

然后,将可调用目标移动到动态循环之外:

Then, move your target callable outside the dynamic loop:

def functionB(a, b):
    ...

def main():
    ...

考虑此示例...

已损坏

import multiprocessing

def broken():
    vals = [1,2,3]

    def test(x):
        return x

    pool = multiprocessing.Pool()
    output = pool.map(test, vals)
    print output

broken()
# PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

工作

import multiprocessing

def test(x):
    return x

def working():
    vals = [1,2,3]

    pool = multiprocessing.Pool()
    output = pool.map(test, vals)
    print output

working()
# [1, 2, 3]

这篇关于我该如何避免这种酸洗错误,以及在Python中并行化此代码的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆