使用DAG的Condor作业,其中一些作业需要运行同一主机 [英] Condor job using DAG with some jobs needing to run the same host

查看:128
本文介绍了使用DAG的Condor作业,其中一些作业需要运行同一主机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个计算任务,该任务分为多个独立的程序执行,并具有依赖性。我正在使用Condor 7作为任务调度程序(在Vanilla Universe中,由于对程序的限制超出了我的能力范围,因此不涉及检查点),因此DAG看起来是很自然的解决方案。但是,某些程序需要在同一主机上运行。在Condor手册中找不到如何执行此操作的参考。

I have a computation task which is split in several individual program executions, with dependencies. I'm using Condor 7 as task scheduler (with the Vanilla Universe, due do constraints on the programs beyond my reach, so no checkpointing is involved), so DAG looks like a natural solution. However some of the programs need to run on the same host. I could not find a reference on how to do this in the Condor manuals.

示例DAG文件:

JOB  A  A.condor 
JOB  B  B.condor 
JOB  C  C.condor    
JOB  D  D.condor
PARENT A CHILD B C
PARENT B C CHILD D

我需要表示B和D必须在同一计算机节点上运行,而不会破坏B和C的并行执行。

I need to express that B and D need to be run on the same computer node, without breaking the parallel execution of B and C.

感谢您的帮助。

推荐答案

神鹰没有任何简单的解决方案,但是至少有一个应该有效的软键:

Condor doesn't have any simple solutions, but there is at least one kludge that should work:

B在执行节点上留下一些状态,可能是文件形式,它表示类似 MyJobRanHere = UniqueIdentifier 。使用 STARTD_CRON支持来检测D并在机器ClassAd中做广告。D使用 Requirements = MyJobRanHere == UniqueIdentifier D的最终清理的一部分,或者可能是新节点E,删除状态。如果您要运行大量作业,则可能偶尔需要清除剩余状态。

Have B leave some state behind on the execute node, probably in the form of a file, that says something like MyJobRanHere=UniqueIdentifier". Use the STARTD_CRON support to detect this an advertise it in the machine ClassAd. Have D use Requirements=MyJobRanHere=="UniqueIdentifier". A part of D's final cleanup, or perhaps a new node E, it removes the state. If you're running large numbers of jobs through, you'll probably need to clean out left-over state occasionally.

这篇关于使用DAG的Condor作业,其中一些作业需要运行同一主机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆