回答:分配完所有GPU后，无法再提交cpu作业 [英] SLURM: After allocating all GPUs no more cpu job can be submitted

查看：132 发布时间：2020/5/1 10:20:15 linux ubuntu gpu nvidia slurm

本文介绍了回答:分配完所有GPU后，无法再提交cpu作业的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们刚刚开始使用Slurm来管理我们的GPU(目前只有2个).我们使用ubuntu 14.04和slurm-llnl.我已经配置了gres.conf并且srun可以工作. 问题是，如果我使用--gres=gpu:1运行两个作业，则两个GPU已成功分配，并且这些作业开始运行.现在，我希望能够在不使用--gres=gpu:1的情况下运行更多的作业(除了2个GPU作业之外)(即，不仅仅使用CPU和ram的作业)，但是这是不可能的.

We have just started using slurm for managing our GPUs (currently just 2). We use ubuntu 14.04 and slurm-llnl. I have configured gres.conf and srun works. The problem is that if I run two jobs with --gres=gpu:1 then the two GPUs are successfully allocated and the jobs start running; now I expect to be able to run more jobs (in addition to the 2 GPU jobs) without --gres=gpu:1 (i.e. jobs than only use CPU and ram) but it is not possible.

该错误消息表明它无法分配所需的资源(即使有24个CPU内核).

The error message says that it could not allocate required resources (even though there are 24 CPU cores).

这是我的gres.conf:

This is my gres.conf:

Name=gpu Type=titanx File=/dev/nvidia0
Name=gpu Type=titanx File=/dev/nvidia1
NodeName=ubuntu Name=gpu Type=titanx File=/dev/nvidia[0-1]

感谢您的帮助.谢谢.

回答:分配完所有GPU后，无法再提交cpu作业 [英] SLURM: After allocating all GPUs no more cpu job can be submitted

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

回答:分配完所有GPU后，无法再提交cpu作业 [英] SLURM: After allocating all GPUs no more cpu job can be submitted

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭