使用HTCondor调度程序进行操作 [英] Dask with HTCondor scheduler

查看:303
本文介绍了使用HTCondor调度程序进行操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有并行步骤的图像分析管道.流水线位于python中,并行化由dask.distributed控制.最低处理设置为1个调度程序+ 3个工作程序,每个工作程序有15个进程.在分析的第一个简短步骤中,我使用1个进程/工作人员,但是该节点的所有RAM,然后在所有其他分析步骤中,都使用了所有节点和进程.

I have an image analysis pipeline with parallelised steps. The pipeline is in python and the parallelisation is controlled by dask.distributed. The minimum processing set up has 1 scheduler + 3 workers with 15 processes each. In the first short step of the analysis I use 1 process/worker but all RAM of the node then in all other analysis steps all nodes and processes are used.

管理员将安装HTCondor作为群集的调度程序.

The admin will install HTCondor as a scheduler for the cluster.

为了使我的代码在新设置上运行,我打算使用

In order order to have my code running on the new setup I was planning to use the approach showed in the dask manual for SGE because the cluster has a shared network files system.

# job1 
# Start a dask-scheduler somewhere and write connection information to file
qsub -b y /path/to/dask-scheduler --scheduler-file /path/to/scheduler.json

# Job2
# Start 100 dask-worker processes in an array job pointing to the same file
qsub -b y -t 1-100 /path/to/dask-worker --scheduler-file /path/to/scheduler.json

# Job3 
# Start a process with the python code where the client is started this way
client = Client(scheduler_file='/path/to/scheduler.json')

问题和建议

如果我对这种方法的理解正确,我将以独立的工作(不同的HTCondor提交文件)启动调度程序,工作程序和分析.如何确保执行顺序正确?有没有一种方法可以使用我以前使用的相同处理方法,或者将更有效地翻译代码以使其与HTCondor更好地配合使用? 感谢您的帮助!

Question and advice

If I understood correctly with this approach I will start scheduler, workers and analysis as independent jobs (different HTCondor submit files). How can I make sure that the order of execution will be correct? Is there a way I can use the same processing approach I have being using before or will be more efficient to translate the code to work better with HTCondor? Thanks for the help!

推荐答案

HTCondor JobQueue支持已合并(

HTCondor JobQueue support has been merged (https://github.com/dask/dask-jobqueue/pull/245) and should now be available in Dask JobQueue (HTCondorCluster(cores=1, memory='100MB', disk='100MB') )

这篇关于使用HTCondor调度程序进行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆