Grid engine集群+ OpenCV:奇怪的行为 [英] Grid engine cluster + OpenCV: strange behaviour

查看:172
本文介绍了Grid engine集群+ OpenCV:奇怪的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Grid Engine集群来运行一些OpenCV代码。代码在本地执行时运行良好,但是当提交到网格时,它不工作。我在这里提取了一个最小示例。



在目录〜/ code / 中有一个文件 test.cpp 包含以下代码:

  #include< opencv2 / core.hpp> ; 
#include< iterator>
#include< string>
#include< sys / types.h>
#include< sys / stat.h>
using namespace cv;
using namespace std;


int main(int ac,char ** av)
{
///创建一个随机矩阵
Mat M;

///创建子文件夹
string folderName =sub /;
mkdir(folderName.c_str(),0777);

return 0;
}

代码编译时无错误。



在本地执行时,即

  username @ machine:〜/ code $ ./test 


它会创建一个子文件夹,即〜/ code / sub



为了提交到网格,我在主目录中创建了一个作业脚本 job.sh 〜/ job.sh )包含

  cd代码/ 
./test

,然后使用

提交

  qsub job.sh 

(没有错误)。



但是,当我移除该行时,

  Mat M; 

它创建了预期的文件夹。
$ b

这种行为的可能原因是什么?我想的像OpenCV的共享库的一些东西没有安装在网格的其他计算机,但我不知道,我不知道如何验证。



感谢您提出任何建议。

解决方案

这些库需要可供您希望提交作业的队列中的所有执行节点访问。如果执行节点有权访问共享位置(如NFS安装),则可以在那里安装库。否则,您需要在所有执行节点上安装所需的lib。关于SET_LIB_PATH的其他链接:



blogs.oracle .com / templedf / entry / inheriting_job_environment



虽然这有助于指向正确的位置,但仍需要访问库


I'm using a Grid Engine cluster for running some OpenCV code. The code runs well when executed locally, but when submitted to the grid it's not working. I extracted here a minimal example.

In the directory ~/code/ I have a file test.cpp containing the following code:

#include <opencv2/core.hpp>
#include <iterator>
#include <string>
#include <sys/types.h>
#include <sys/stat.h>
using namespace cv;
using namespace std;


int main(int ac, char** av)
{    
    /// Create a random matrix
    Mat M;

    /// Create a subfolder
    string folderName = "sub/";
    mkdir(folderName.c_str(),0777);

    return 0;
}

The code is compiled without errors.

When executing locally, i.e.

username@machine:~/code$ ./test

it creates a subfolder, i.e. ~/code/sub, as expected.

For submitting to the grid, I created a job script job.sh in the home directory (i.e. ~/job.sh) containing

cd code/
./test

and then submit using

qsub job.sh

Nothing happened. (And no errors).

However, when I removed the line

Mat M;

it did create the folder as expected.

What are the possible reasons for this behaviour? I'm thinking of something like the shared libs of OpenCV weren't installed in other computers of the grid, but I'm not sure and I don't know how to verify that.

Thank you in advance for any suggestions.

解决方案

The libraries need to be accessible to all execution nodes in queue you want to submit job to. If execution nodes have access to shared location, such as NFS mount, you can install the libraries there. Otherwise, you need to install required libs on all execution nodes. Additional link regarding SET_LIB_PATH:

blogs.oracle.com/templedf/entry/inheriting_job_environment

While this would help point to right location, the libraries still need to be accessible

这篇关于Grid engine集群+ OpenCV:奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆