使用CycleCloud和Slurm - 作业处于挂起状态 [英] Using CycleCloud and Slurm - jobs stuck in pending state

查看:588
本文介绍了使用CycleCloud和Slurm - 作业处于挂起状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一些软件使用Slurm将作业提交到队列中,它在我们的现场集群以及各种客户的Slurm设置上按预期工作。


我们看到的问题是,当我们在CycleCloud的Slurm上提交多节点作业时,正确的资源数量会增加,但是,这些作业似乎永远不会过渡进入"跑步"州。它们仍然停留在"待定(资源)"状态。
state。


我已经运行了一个测试脚本,它至少可以提交多节点作业。这些适当地调整适当数量的资源并运行该作业。所以,很明显,我们配置中的某些东西必须关闭。


有人可以分享一些关于在哪里跟踪工作陷入待决状态的原因的指针吗?



谢谢,


Eric

解决方案

此论坛适用于 Azure Stack ,这是一个混合云平台,可让您使用公司或服务提供商的数据中心提供的Azure服务。 


对于Azure CycleCloud问题,请创建支持请求。如果您没有支持计划,请发送电子邮件至AzCommunity@microsoft.com,并附上您的订阅ID和此帖子的链接,并为您的订阅启用一次性免费支持请求。 


We have some software that uses Slurm to submit jobs to a queue and it works as expected on our on-site cluster as well as a variety of our clients' Slurm setups.

The issue we are seeing is that when we submit a multi-node job on CycleCloud's Slurm, the correct number of resources spin up, however, the jobs never seem to transition into a "Running" state. They remain stuck in "Pending(Resources)" state.

I have run a test script that does the bare minimum to submit multi-node jobs. These properly spin up the appropriate number of resources and run the job. So, clearly, something in our configuration must be off.

Can anyone share some pointers of where to track reasons for jobs getting stuck in a pending state?

Thanks,

Eric

解决方案

This forum is for Azure Stack, a hybrid cloud platform that lets you use Azure services from your company's or service provider's datacenter. 

For Azure CycleCloud issues, please create a Support Request. If you do not have a support plan please email me at AzCommunity@microsoft.com with your Subscription ID and a link to this post, and will can enable a one-time free support request for your subscription. 


这篇关于使用CycleCloud和Slurm - 作业处于挂起状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆