使用Pathos的分布式集群的Python多处理 [英] Python Multiprocessing with Distributed Cluster Using Pathos

查看:111
本文介绍了使用Pathos的分布式集群的Python多处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在多台不同的计算机上使用多重处理,而pathos似乎是

  1. 设置具有安全身份验证的远程服务器或一组远程服务器.
  2. 安全地连接远程服务器.
  3. 使用简单的API(例如标准多处理程序包中的pool.map)在远程服务器和本地计算机上的所有CPU上映射任务(例如解决方案

我是pathos的作者.基本上,对于(1),您可以使用pathos.pp通过套接字连接连接到另一台计算机. pathos.pp具有与pathos.multiprocessing几乎完全相同的API,尽管使用pathos.pp您可以在设置Pool时使用关键字servers给出要连接的远程主机的地址和端口. >

但是,如果要与SSH建立安全连接,最好建立SSH隧道连接(如您所链接的示例),然后将localhost和本地端口号传递给Pool中的>关键字.然后,它将通过ssh隧道连接到远程pp-worker.看: https://github.com/uqfoundation/pathos/blob/master/examples/test_ppmap2.py http://www.cacr.caltech.edu/~mmckerns/pathos.html

最后,如果将pathos.pp与远程服务器一起使用,则应该已经在做(3).但是,嵌套并行映射可能会更高效(对于令人尴尬的足够多的并行作业集)……因此,首先使用pathos.pp.ParallelPythonPool在服务器之间构建并行映射,然后调用<在使用pathos.pp映射的函数内的pathos.multiprocessing.ProcessingPool中使用并行映射执行c15> -way作业.这样可以最大程度地减少跨远程连接的通信.

此外,如果您使用的是ssh-agent,则无需提供SSH密码.请参阅: http://mah.everybody.org/docs/ssh . Pathos假设要在远程服务器上进行并行映射,您将使用ssh-agent,并且每次连接时都不需要输入密码.

在此处对您的问题添加了示例代码:具有分布式集群的Python多处理

I am trying to to make use of multiprocessing across several different computers, which pathos seems geared towards: "Pathos is a framework for heterogenous computing. It primarily provides the communication mechanisms for configuring and launching parallel computations across heterogenous resources." In looking at the documentation, however, I am at a loss as to how to get a cluster up and running. I am looking to:

  1. Set up a remote server or set of remote servers with secure authentication.
  2. Securely connect the the remote server(s).
  3. Map a task across all CPUs in both the remote servers and my local machine using a straightforward API like pool.map in the standard multiprocessing package (like the pseudocode in this related question).

I do not see an example for (1) and I do not understand the tunnel example provided for (2). The example does not actually connect to an existing service on the localhost. I would also like to know if/how I can require this communication to come with a password/key of some kind that would prevent someone else from connecting to the server. I understand this uses SSH authentication, but absent a preexisting key that only insures that the traffic is not read as it passes over the Internet, but does nothing to prevent someone else from hijacking the server.

解决方案

I'm the pathos author. Basically, for (1) you can use pathos.pp to connect to another computer through a socket connection. pathos.pp has almost exactly the same API as pathos.multiprocessing, although with pathos.pp you can give the address and port of a remote host to connect to, using the keyword servers when setting up the Pool.

However, if you want to make a secure connection with SSH, it's best to establish a SSH-tunnel connection (as in the example you linked to), and then pass localhost and the local port number to the servers keyword in Pool. This will then connect to the remote pp-worker through the ssh tunnel. See: https://github.com/uqfoundation/pathos/blob/master/examples/test_ppmap2.py and http://www.cacr.caltech.edu/~mmckerns/pathos.html

Lastly, if you are using pathos.pp with a remote server, as above, you should be already doing (3). However, it can be more efficient (for an embarrassingly parallel enough set of jobs), that you nest the parallel maps… so first use pathos.pp.ParallelPythonPool to build a parallel map across servers, then call a N-way job using a parallel map in pathos.multiprocessing.ProcessingPool inside the function you are mapping with pathos.pp. This will minimize the communication across the remote connection.

Also, you don't need to give a SSH password, if you have ssh-agent working for you. See: http://mah.everybody.org/docs/ssh. Pathos assumes for parallel maps across remote servers, you will have ssh-agent working and you won't need to type your password every time there's a connection.

EDIT: added example code on your question here: Python Multiprocessing with Distributed Cluster

这篇关于使用Pathos的分布式集群的Python多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆