使用Pathos的分布式集群的Python多处理 [英] Python Multiprocessing with Distributed Cluster Using Pathos
问题描述
我正在尝试在多台不同的计算机上使用多重处理,而pathos
似乎是
pool.map
)在远程服务器和本地计算机上的所有CPU上映射任务(例如解决方案
我是pathos
的作者.基本上,对于(1),您可以使用pathos.pp
通过套接字连接连接到另一台计算机. pathos.pp
具有与pathos.multiprocessing
几乎完全相同的API,尽管使用pathos.pp
您可以在设置Pool
时使用关键字servers
给出要连接的远程主机的地址和端口. >
但是,如果要与SSH建立安全连接,最好建立SSH隧道连接(如您所链接的示例),然后将localhost
和本地端口号传递给pp-worker
.看:
https://github.com/uqfoundation/pathos/blob/master/examples/test_ppmap2.py 和
http://www.cacr.caltech.edu/~mmckerns/pathos.html
最后,如果将pathos.pp
与远程服务器一起使用,则应该已经在做(3).但是,嵌套并行映射可能会更高效(对于令人尴尬的足够多的并行作业集)……因此,首先使用pathos.pp.ParallelPythonPool
在服务器之间构建并行映射,然后调用<在使用pathos.pp
映射的函数内的pathos.multiprocessing.ProcessingPool
中使用并行映射执行c15> -way作业.这样可以最大程度地减少跨远程连接的通信.
此外,如果您使用的是ssh-agent,则无需提供SSH密码.请参阅: http://mah.everybody.org/docs/ssh . Pathos假设要在远程服务器上进行并行映射,您将使用ssh-agent,并且每次连接时都不需要输入密码.
在此处对您的问题添加了示例代码:具有分布式集群的Python多处理
I am trying to to make use of multiprocessing across several different computers, which pathos
seems geared towards: "Pathos is a framework for heterogenous computing. It primarily provides the communication mechanisms for configuring and launching parallel computations across heterogenous resources." In looking at the documentation, however, I am at a loss as to how to get a cluster up and running. I am looking to:
- Set up a remote server or set of remote servers with secure authentication.
- Securely connect the the remote server(s).
- Map a task across all CPUs in both the remote servers and my local machine using a straightforward API like
pool.map
in the standard multiprocessing package (like the pseudocode in this related question).
I do not see an example for (1) and I do not understand the tunnel example provided for (2). The example does not actually connect to an existing service on the localhost. I would also like to know if/how I can require this communication to come with a password/key of some kind that would prevent someone else from connecting to the server. I understand this uses SSH authentication, but absent a preexisting key that only insures that the traffic is not read as it passes over the Internet, but does nothing to prevent someone else from hijacking the server.
I'm the pathos
author. Basically, for (1) you can use pathos.pp
to connect to another computer through a socket connection. pathos.pp
has almost exactly the same API as pathos.multiprocessing
, although with pathos.pp
you can give the address and port of a remote host to connect to, using the keyword servers
when setting up the Pool
.
However, if you want to make a secure connection with SSH, it's best to establish a SSH-tunnel connection (as in the example you linked to), and then pass localhost
and the local port number to the servers
keyword in Pool
. This will then connect to the remote pp-worker
through the ssh tunnel. See:
https://github.com/uqfoundation/pathos/blob/master/examples/test_ppmap2.py and
http://www.cacr.caltech.edu/~mmckerns/pathos.html
Lastly, if you are using pathos.pp
with a remote server, as above, you should be already doing (3). However, it can be more efficient (for an embarrassingly parallel enough set of jobs), that you nest the parallel maps… so first use pathos.pp.ParallelPythonPool
to build a parallel map across servers, then call a N
-way job using a parallel map in pathos.multiprocessing.ProcessingPool
inside the function you are mapping with pathos.pp
. This will minimize the communication across the remote connection.
Also, you don't need to give a SSH password, if you have ssh-agent working for you. See: http://mah.everybody.org/docs/ssh. Pathos assumes for parallel maps across remote servers, you will have ssh-agent working and you won't need to type your password every time there's a connection.
EDIT: added example code on your question here: Python Multiprocessing with Distributed Cluster
这篇关于使用Pathos的分布式集群的Python多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!