SGE提交的作业状态不会从"qw"更改为 [英] SGE submitted job state doesn't change from "qw"

查看:382
本文介绍了SGE提交的作业状态不会从"qw"更改为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在ubuntu 14.04上使用Sun Grid Engine来排队要在多核CPU上运行的作业. 我已经在系统上安装并设置了SGE.我创建了一个"hello_world"该目录包含两个shell脚本,即"hello_world.sh". & "hello_world_qsub.sh",第一个包括简单命令,第二个包括qsub命令,以提交第一个脚本文件作为要运行的作业. 这是"hello_world.sh"包括:

I'm using Sun Grid Engine on ubuntu 14.04 to queue my jobs to be run on a multicore CPU. I've installed and set up SGE on my system. I created a "hello_world" dir which contains two shell scripts namely "hello_world.sh" & "hello_world_qsub.sh", first one including a simple command and second one including qsub command to submit the first script file as a job to be run. Here's what "hello_world.sh" includes:

#!/bin/bash

echo "Hello world" > /home/theodore/tmp/hello_world/hello_world_output.txt

这是"hello_world_qsub.sh"包括:

And here's what "hello_world_qsub.sh" includes:

#!/bin/bash

qsub \
  -e /home/hello_world/hello_world_qsub.error \
  -o /home/hello_world/hello_world_qsub.log \
  ./hello_world.sh

在授予第二个sh文件权限并使用"./hello_world_qsub.sh"运行该文件后,从指定目录执行命令,输出是合理的:

after giving permission to the second sh file and running it with "./hello_world_qsub.sh" command from the specified dir, the output is reasonable:

Your job 1 ("hello_world.sh") has been submitted

但是,"qstat"的输出命令令人沮丧:

But the output of "qstat" command is frustrating:

    job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
    -----------------------------------------------------------------------------------------------------------------
     1 0.50000 hello_worl mhr          qw    05/16/2016 20:26:23                                    1        

状态"列始终保持在"qw"并且永远不会更改为"r".

And the "state" column always remains on "qw" and never changes to "r".

以下是"qstat -j 1"的输出命令:

Here's the output of "qstat -j 1" command:

==============================================================
job_number:                 1
exec_file:                  job_scripts/1
submission_time:            Mon May 16 20:26:23 2016
owner:                      mhr
uid:                        1000
group:                      mhr
gid:                        1000
sge_o_home:                 /home/mhr
sge_o_log_name:             mhr
sge_o_path:                 /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
sge_o_shell:                /bin/bash
sge_o_workdir:              /home/mhr/hello_world
sge_o_host:                 localhost
account:                    sge
stderr_path_list:           NONE:NONE:/home/hello_world/hello_world_qsub.error
mail_list:                  mhr@localhost
notify:                     FALSE
job_name:                   hello_world.sh
stdout_path_list:           NONE:NONE:/home/hello_world/hello_world_qsub.log
jobshare:                   0
env_list:                   
script_file:                ./hello_world.sh
scheduling info:            queue instance "mainqueue@localhost" dropped because it is temporarily not available
                        All queues dropped because of overload or full

这是"qhost"的输出,命令:

And here's the output of "qhost" command:

HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
localhost               -               -     -       -       -       -       -

我该怎么做才能使我的工作运行并完成他们的任务?

What should I do to make my jobs run and finish their task?

推荐答案

从您的qhost输出中,您的计算机"localhost"似乎已在SGE中正确配置.但是,在"localhost"上,sge_execd未运行或未正确配置.如果是这样,qhost将报告"localhost"的统计信息.

From your qhost output, it looks like your machine "localhost" is properly configured in SGE. However, on "localhost" sge_execd is either not running or not configured properly. If it were, qhost would report statistics for "localhost".

这篇关于SGE提交的作业状态不会从"qw"更改为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆