具有sudo权限的OpenMPI/mpirun或mpiexec [英] OpenMPI / mpirun or mpiexec with sudo permission

查看:395
本文介绍了具有sudo权限的OpenMPI/mpirun或mpiexec的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究与Epiphany处理器一起使用的代码( http://www.parallella.org/)并运行主显节代码,我需要在主机端程序上具有sudo权限. sudo无法逃脱!

I'm working on a code that work with Epiphany processor (http://www.parallella.org/) and to run Epiphany codes i need sudo privileges on host side program. There is no escape from sudo!

现在,我需要在多个节点上运行此代码,以执行此操作,我正在使用mpi,但mpi无法与sudo一起正常工作

Now i need to run this code across several nodes, in order to do that i'm using mpi but mpi wont function properly with sudo

#sudo mpirun -n 12 --hostfile hosts -x LD_LIBRARY_PATH=${ELIBS} -x EPIPHANY_HDF=${EHDF} ./hello-mpi.elf

即使执行节点通信的简单代码也不起作用.如果我使用sudo,则排名为0. 线程之间的通信有效,但不能跨节点通信.这很重要,因为我想将工作负载适当地分配到各个卡上.

Even a simple code that does node communication does not work. The ranks comes 0 if i use sudo. Communication between threads works but not across nodes. This is important because i wanted to divide the work load properly across the cards.

这是简单的代码

#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[]) {
   int numprocs, rank, namelen;
   char processor_name[MPI_MAX_PROCESSOR_NAME];

   MPI_Init(&argc, &argv);
   MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Get_processor_name(processor_name, &namelen);

   printf("Hello World from MPI Process %d on machine %s\n", rank, processor_name);

   MPI_Finalize();
}

此代码应在节点上以不同的方式吐出等级编号,但不适用于sudo

This code should spit out the rank number differently across the nodes but it does not work with sudo

任何对此的帮助都会很棒

Any help on this would be great

这是不使用sudo运行上述代码的输出.

Here is the output from running the above code without sudo.

mpirun -n 3 --hostfile $MPI_HOSTS ./mpitest

输出:

Hello world from processor work1, rank 1 out of 3 processors
Hello world from processor command, rank 0 out of 3 processors
Hello world from processor work2, rank 2 out of 3 processors

这是预期的.

这是使用sudo运行以上代码的输出.

Here is the output from running the above code with sudo.

sudo mpirun -n 3 --hostfile $MPI_HOSTS ./mpitest

输出:

Hello world from processor command, rank 0 out of 1 processors
Hello world from processor work1, rank 0 out of 1 processors
Hello world from processor work2, rank 0 out of 1 processors

不是.

-

我认为@ Hristo Iliev 得到了正确的答案,但我无法对此进行测试

I think @Hristo Iliev got the right answer but I'm not going to be able to test this out

推荐答案

简短答案:命令应为:

mpirun -n 12 ... sudo -E ./hello-mpi.elf

要使其正常工作,必须在所有主机上修改sudo配置(通过visudo)并为用户启用无密码操作:

For that to work properly, you have to modify the sudo configuration (via visudo) on all hosts and enable passwordless operation for your user:

username ALL = NOPASSWD:SETENV: /path/to/mpirun

此条目将使您的用户无需先进行身份验证即可运行sudo mpirun,这很重要,因为仅重定向了等级0的标准输入.它还允许您使用-E选项执行sudo,以便允许它将特殊的Open MPI变量(OMPI_...)传递给可执行文件(在环境中没有这些变量的情况下,可执行文件无法连接到每个其他,而是作为单例运行.

This entry will allow your user to run sudo mpirun without first authenticating yourself, which is important since only the standard input of rank 0 is redirected. It will also allow you to execute sudo with the -E option in order to allow it to pass the special Open MPI variables (OMPI_...) to the executable (without those variables in the environment, the executables cannot connect to each other and instead run as singletons).

长答案:用sudo运行mpirun会导致前者由有效用户root执行. mpirun创建MPI作业的方式是,首先启动请求数量的可执行文件,然后等待它们在MPI_Init调用期间相互了解.根据主机列表文件的内容,mpirun会生成一个子进程(对于与主机mpirun匹配的主机条目将在其上执行),或者使用rshssh或其他某种机制远程启动一个进程(例如,许多集群资源管理系统具有自己的机制).使用rsh/ssh机制时,由于该程序以root用户身份运行,因此mpirun尝试以root用户身份登录其他主机.通常,由于以下两个原因中的一个或两个,此操作将失败:

Long answer: Running mpirun with sudo results in the former being executed with effective user root. The way mpirun creates an MPI job is by first launching the requested number of executables and then waiting for them to get to know each other during the MPI_Init call. Depending on the content of the host list file, mpirun either spawns a child process (for host entries that match the host mpirun is executed on) or starts a process remotely using rsh, ssh or some other mechanism (e.g. many cluster resource management systems have their own mechanisms for that). When the rsh/ssh mechanism is used, since the program runs as root, mpirun attempts to log into the other host(s) as root. This usually fails for one or both of two reasons:

  • root用户无法在不提供密码(例如密码)的情况下登录到指定的主机.尚未设置使用公共密钥身份验证;
  • 多年来,root用户一直不允许远程登录,这是许多Unix系统中的默认SSH配置.

这就是为什么您看到等级0出现的原因(它是基于fork()的本地生成),而其他等级丢失了.由于许多人认为启用远程root用户登录会带来安全风险,因此,我宁愿按照简短答案中所述的方法进行操作.

That's why you see rank 0 coming up (it's a local fork()-based spawn) and the other ranks missing. Since enabling remote root login is considered a security risk by many, I would rather go the way described in the short answer.

另一种选择是将hello-mpi.elf归root所有,并通过chmod u+s hello-mpi.elf设置Set UID位.这样就完全不需要sudo了.如果使用nosuid选项挂载了文件系统,或者其他一些安全机制处于活动状态,则此方法将不起作用.而且,由于root用户拥有的suid二进制文件始终以root用户权限执行,因此无论用户运行什么用户,都存在安全风险.

Another option would be to make hello-mpi.elf owned by root and set the Set UID bit via chmod u+s hello-mpi.elf. Then you won't need sudo at all. This will not work if the filesystem is mounted with the nosuid option or if some other security mechanism is active. Also root-owned suid binaries pose security risks since they always execute with root permissions, no matter what user runs them.

我想知道,为什么您需要root权限才能与Epiphany董事会进行对话. SDK是在执行一些特殊的特权操作,还是只是访问只能由root写入的/dev中的设备文件?如果是后者,则可能可以使用不同的权限来创建设备节点.

I wonder, why you need root permissions in order to talk to the Epiphany board. Is the SDK doing some fancy privileged operations or is it simply accessing a device file in /dev that is only writeable by root? If it's the latter, perhaps the device node could be created with different permissions.

这篇关于具有sudo权限的OpenMPI/mpirun或mpiexec的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆