使用-L标志和addprocs脚本是否是-p和--machinefile的更强大版本? [英] Is using the -L flag and a addprocs script the more powerful version of -p and --machinefile?

查看:158
本文介绍了使用-L标志和addprocs脚本是否是-p和--machinefile的更强大版本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我对我的工作流程有一组相当复杂的要求. 我想使用主从拓扑和非默认工作目录. 我也想将本地和远程工作人员混在一起.

So I have a moderately complex set of requirements for my worker processes. I want to use a the master slave topology, and a nondefault working directory. I also want to mix both local and remote workers.

据我所准备的文档. 它不会让我那样做.

As far as I can tell from readying the --machine-file section of the documentation. It will not let me do that.

所以我正在查看-L <file参数

> julia -h
...
-L,--load立即在所有处理器上加载
...

>julia -h
...
-L, --load Load immediately on all processors
...

因此,如果我不使用-p或--machine-file`标志,则最初只有一个处理器,因此所有处理器仅表示唯一的处理器.

So if I do not use the -p or --machine-file` flags, then there is initially only one processer so the all processors just mean on the only processor.

所以我尝试了

addprocs([
          ("cluster_c4_1",:auto),
          ("cluster_c4_2",:auto)
    ],
        dir="/mnt/",
        topology=:master_slave
        )

addprocs(
        dir="/mnt/",
        topology=:master_slave
        )

test.jl

println("*************")
println(workers())
println("-------------")

运行它:

>julia -L start_workers.jl pl.jl 
*************
[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
-------------

所以看起来不错,我有20个工人. 我做了什么不合理的事情吗?这是最好的方法吗?

So it looks all good, got my 20 workers. Have I done anything unreasonable? Is this the best way?

推荐答案

这正是我在Torque调度程序下将其部署在HPC群集上的方式.实际上,在通过Torque调度系统添加进程时,我正在重新编写集群管理器以支持更多选项,因此,我花了很多时间对此进行研究.

That's exactly how I'm deploying it on a HPC cluster under Torque scheduler. In fact I'm in the process of re-writing the the cluster manager to support more options when adding processes through the Torque scheduling systems in particular, so I've spent quite a bit of time looking into this.

您可能还想知道有各种各样的ClusterManager,Pkg.add("ClusterManagers")在各种环境下(例如,当您需要从调度程序请求资源时)扩展了addprocs的功能.看起来您可以使用无密码的ssh,因此默认的集群管理器就足够了.

You might also want to be aware there are various ClusterManagers, Pkg.add("ClusterManagers") that extend the ability of addprocs under a variety of environments, such as when you need to request the resources from a scheduler. It looks like passwordless ssh is possible for you, so the default cluster manager is sufficient in your case.

我认为无法在命令行上定义任何额外的拓扑和目录参数,因此您的方法是正确的.

I don't believe there is any way of defining the extra topology and directory parameters on the command line, so your approach is correct.

这篇关于使用-L标志和addprocs脚本是否是-p和--machinefile的更强大版本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆