使用HTCondor调度程序进行操作 [英] Dask with HTCondor scheduler
问题描述
我有一个具有并行步骤的图像分析管道.流水线位于python
中,并行化由dask.distributed
控制.最低处理设置为1个调度程序+ 3个工作程序,每个工作程序有15个进程.在分析的第一个简短步骤中,我使用1个进程/工作人员,但是该节点的所有RAM,然后在所有其他分析步骤中,都使用了所有节点和进程.
I have an image analysis pipeline with parallelised steps. The pipeline is in python
and the parallelisation is controlled by dask.distributed
. The minimum processing set up has 1 scheduler + 3 workers with 15 processes each. In the first short step of the analysis I use 1 process/worker but all RAM of the node then in all other analysis steps all nodes and processes are used.
管理员将安装HTCondor
作为群集的调度程序.
The admin will install HTCondor
as a scheduler for the cluster.
In order order to have my code running on the new setup I was planning to use the approach showed in the dask manual for SGE because the cluster has a shared network files system.
# job1
# Start a dask-scheduler somewhere and write connection information to file
qsub -b y /path/to/dask-scheduler --scheduler-file /path/to/scheduler.json
# Job2
# Start 100 dask-worker processes in an array job pointing to the same file
qsub -b y -t 1-100 /path/to/dask-worker --scheduler-file /path/to/scheduler.json
# Job3
# Start a process with the python code where the client is started this way
client = Client(scheduler_file='/path/to/scheduler.json')
问题和建议
如果我对这种方法的理解正确,我将以独立的工作(不同的HTCondor提交文件)启动调度程序,工作程序和分析.如何确保执行顺序正确?有没有一种方法可以使用我以前使用的相同处理方法,或者将更有效地翻译代码以使其与HTCondor更好地配合使用? 感谢您的帮助!
Question and advice
If I understood correctly with this approach I will start scheduler, workers and analysis as independent jobs (different HTCondor submit files). How can I make sure that the order of execution will be correct? Is there a way I can use the same processing approach I have being using before or will be more efficient to translate the code to work better with HTCondor? Thanks for the help!
推荐答案
HTCondor JobQueue support has been merged (https://github.com/dask/dask-jobqueue/pull/245) and should now be available in Dask JobQueue (HTCondorCluster(cores=1, memory='100MB', disk='100MB')
)
这篇关于使用HTCondor调度程序进行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!