如何设置群集从属节点(在Windows上) [英] How to set up cluster slave nodes (on Windows)

查看:211
本文介绍了如何设置群集从属节点(在Windows上)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在所有Windows的15台机器(每个4核)上运行数千个模型。我开始学习 parallel snow snowfall 软件包和阅读了一堆介绍,但是他们主要关注母版的设置。 关于如何在Windows上设置工作节点(从节点)的信息很少。信息通常是矛盾的: 有人说SOCK集群实际上是最简单的解决方法,其他人则认为在Windows上,SOCK群集设置非常复杂(sshd设置),最好的方法是MPI

I need to run thousands* of models on 15 machines (each of 4 cores), all Windows. I started to learn parallel, snow and snowfall packages and read a bunch of intro's, but they mainly focus on the setup of the master. There is only a little information on how to set up the worker (slave) nodes on Windows. The information is often contradictory: some say that SOCK cluster is practically the easiest way to go, others claim that SOCK cluster setup is complicated on Windows (sshd setup) and the best way to go is MPI.

那么,在Windows上安装从属节点的最简单方法是什么? MPI,PVM,SOCK或NWS?我的想法,可能是幼稚的(按优先级列出):

So, what is an easiest way to install slave nodes on Windows? MPI, PVM, SOCK or NWS? My, possibly naive ideas were (listed by priority):


  1. 要在从属节点上使用所有4个核(必需)。

  2. 理想情况下,我只需要带有某些软件包的R和一个从R脚本或R函数,该脚本将在某些端口上侦听并等待主服务器发出的任务。

  3. 理想情况下,可以从群集中动态添加/删除节点。

  4. 理想情况下,从属服务器将连接到主服务器-因此,我不必在主服务器的配置中列出所有从属服务器的IP。

  1. To use all 4 cores on the slave nodes (required).
  2. Ideally, I need only R with some packages and a slave R script or R function that would listen on some port and wait for tasks from master.
  3. Ideally, nodes can be added/removed dynamically from the cluster.
  4. Ideally, the slaves would connect to the master - so I wouldn't have to list all the slaves IP's in configuration of the master.

仅1是100%必需的,而2-4是会很好的。

Only 1 is 100% required, 2-4 are "would be good". Is it too naive to request?

很抱歉,但我无法从可用的文档和教程中弄清楚这一点。如果您指出正确的来源,我将不胜感激。

I am sorry but I have not been able to figure this out from the available docs and tutorials. I would be grateful if you point me out to the right source.



*请注意,成千上万个模型中的每一个都将花费至少7分钟的时间,因此不会有太大的交流

推荐答案

所有这些API(例如并行/降雪/降雪​​)如何复杂地工作真是可惜有很多文档,但不是您需要的...我发现了一个非常简单的API,它直接适用于我所勾画的想法!它是 redis和 doRedis R包 (如在此处推荐)。最后,提供了非常简单的教程!刚刚修改了一下,并得到了以下信息:

It's a shame how all these APIs (like parallel/snow/snowfall) are complex to work with, a lots of docs but not what you need... I have found an API which is very simple and goes straight to the ideas I sketched!! It is redis and doRedis R package (as recommended here). Finally a very simple tutorial is present! Just modified a bit and got this:

工人只需要R ,doRedis软件包和以下脚本:

The workers need only R, doRedis package and this script:

require(doRedis)    
redisWorker('jobs', '10.0.0.7') # IP of the server

主服务器需要 redis服务器正在运行(已为Windows安装了实验Windows二进制文件),并且R代码:

The master needs redis server running (installed the experimental windows binaries for Windows), and this R code:

require(doRedis)
registerDoRedis('jobs')
foreach(j=1:10,.combine=sum,.multicombine=TRUE) %dopar%
    ... # whatever you need to run
removeQueue('jobs')

添加/删除工作程序是完全动态的,无需在主服务器上指定IP,自动负载平衡,简单并且没有需要大量的文档!此解决方案可以满足所有要求,甚至还可以满足更多要求-如?registerDoRedis 中所述:

Adding/removing workers is fully dynamic, no need to specify IPs at master, automatic "load balanancing", simple and no need for tons of docs! This solution fulfills all the requirements and even more - as stated in ?registerDoRedis:


doRedis并行后端可以容忍工作进程中的错误并自动重新提交失败的任务。

The doRedis parallel back end tolerates faults among the worker processes and automatically resubmits failed tasks.

我不知道如何如果可能的话,这将与SOCKS / MPI / PVM / NWS一起使用并行/降雪/降雪​​,但是我想这很复杂...

I don't know how complex this would be using the parallel/snow/snowfall with SOCKS/MPI/PVM/NWS, if it would be possible at all, but I guess very complex...

使用redis的唯一缺点我发现:

The only disadvantages of using redis I found:

  • It is a database server. I wonder if this API exist somewhere without the need to install the database server which I don't need at all. I guess it must exist!
  • There is a bug in the current doRedis package ("object '.doRedisGlobals' not found") with no solution yet and I am not able to install the old working doRedis 1.0.5 package into R 3.0.1.

这篇关于如何设置群集从属节点(在Windows上)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆