带有Pathos的Python多处理 [英] Python multiprocessing with pathos

查看:463
本文介绍了带有Pathos的Python多处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python的方法将计算指定到单独的进程中,以便使用多核处理器来加速它.我的代码的组织方式如下:

class:
   def foo(self,name):
    ...
    setattr(self,name,something)
    ...
   def boo(self):
      for name in list:
         self.foo(name)

由于我在多处理过程中出现了酸洗问题,因此我决定尝试使用可悲的方法. 如先前主题中所述,我尝试过:

import pathos.multiprocessing

但是它导致错误:没有模块多处理-在最新的pathos版本中找不到.

然后我尝试修改boo方法:

def boo(self):
 import pathos
 pathos.pp_map.pp_map(self.foo,list)

现在没有引发任何错误,但是foo不起作用-我的类的实例没有新属性.请帮我,因为在花了一天的时间之后,我不知道下一步该去哪里.

解决方案

我是pathos的作者.我不确定您想从上面的代码中做什么. 但是,我也许可以阐明一些想法.这是一些类似的代码:

>>> from pathos.multiprocessing import ProcessingPool
>>> class Bar:
...   def foo(self, name):
...     return len(str(name))
...   def boo(self, things):
...     for thing in things:
...       self.sum += self.foo(thing)
...     return self.sum
...   sum = 0
... 
>>> b = Bar()
>>> results = ProcessingPool().map(b.boo, [[12,3,456],[8,9,10],['a','b','cde']])
>>> results
[6, 4, 5]
>>> b.sum
0

所以上面发生的是,调用了Bar实例bboo方法,其中将b.boo传递给新的python进程,然后对每个嵌套列表进行评估.您可以看到结果是正确的……len("12")+ len("3")+ len("456")为6,依此类推.

但是,您还可以看到,当您查看b.sum时,它仍然神秘地仍然是0.为什么b.sum仍然为零?好吧,multiprocessing(以及pathos.multiprocessing)的作用是对通过地图传递到另一个python进程的内容进行 COPY ...,然后调用复制的实例(在并行)并返回所调用方法调用的结果.请注意,您必须返回结果,或者打印它们,或者记录它们,或者将它们发送到文件中,否则.它们无法像您期望的那样返回原始实例,因为它不是原始实例被发送到其他处理器.创建实例的副本,然后将其丢弃-每个实例的sum属性均已增加,但原始的"b.sum"未受影响.

但是,在pathos中有一些计划可以像您期望的那样使上述工作正常进行-原始对象 IS 进行了更新,但还不能像以前那样工作. /p>

:如果要使用pip进行安装,请注意,pathos的最新发行版本已经使用了几年,并且可能无法正确安装,或者可能未安装所有子模块. .一个新的pathos版本正在等待中,但是在那之前,最好从github获取最新版本的代码,然后从那里安装.树干大部分在开发中稳定.我认为您的问题可能是由于安装中的新" pip-旧" pathos不兼容,因此未安装所有软件包.如果pathos.multiprocessing丢失,则很可能是罪魁祸首.

从github此处获取pathos: https://github.com/uqfoundation/pathos

I am trying to use Python's pathos to designate computations into separate processes in order to accelerate it with multicore processor. My code is organized like:

class:
   def foo(self,name):
    ...
    setattr(self,name,something)
    ...
   def boo(self):
      for name in list:
         self.foo(name)

As I had pickling problems with multiprocessing.Pool, I decided to try pathos. I tried, as suggested in previous topics:

import pathos.multiprocessing

but it resulted in error: No module multiprocessing - which I can't find in latest pathos version.

Then I tried modify boo method:

def boo(self):
 import pathos
 pathos.pp_map.pp_map(self.foo,list)

Now there is no error thrown, but foo does not work - instance of my class has no new attributes. Please help me, because I have no idea where to move next, after a day spent on that.

解决方案

I'm the pathos author. I'm not sure what you want to do from your code above. However, I can maybe shed some light. Here's some similar code:

>>> from pathos.multiprocessing import ProcessingPool
>>> class Bar:
...   def foo(self, name):
...     return len(str(name))
...   def boo(self, things):
...     for thing in things:
...       self.sum += self.foo(thing)
...     return self.sum
...   sum = 0
... 
>>> b = Bar()
>>> results = ProcessingPool().map(b.boo, [[12,3,456],[8,9,10],['a','b','cde']])
>>> results
[6, 4, 5]
>>> b.sum
0

So what happens above, is that the boo method of the Bar instance b is called where b.boo is passed to a new python process, and then evaluated for each of the nested lists. You can see that the results are correct… len("12")+len("3")+len("456") is 6, and so on.

However, you can also see that when you look at b.sum, it's mysteriously still 0. Why is b.sum still zero? Well, what multiprocessing (and thus also pathos.multiprocessing) does, is make a COPY of whatever you pass through the map to the other python process… and then the copied instance is then called (in parallel) and return whatever results are called by the method invoked. Note you have to RETURN results, or print them, or log them, or send them to a file, or otherwise. They can't go back to the original instance as you might expect, because it's not the original instance that's sent over to the other processors. The copies of the instance are created, then disposed of -- each of them had their sum attribute increased, but the original `b.sum' is untouched.

There is however, plans within pathos to make something like the above work as you might expect -- where the original object IS updated, but it doesn't work like that yet.

EDIT: If you are installing with pip, note that the latest released version of pathos is several years old, and may not install correctly, or may not install all of the submodules. A new pathos release is pending, but until then, it's better to get the latest version of the code from github, and install from there. The trunk is for the most part stable under development. I think your issue may have been that not all packages were installed, due to a "new" pip -- "old" pathos incompatibility in the install. If pathos.multiprocessing is missing, this is the most likely culprit.

Get pathos from github here: https://github.com/uqfoundation/pathos

这篇关于带有Pathos的Python多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆