OpenMPI:尝试使用mpirun时权限被拒绝错误 [英] OpenMPI: Permission denied error while trying to use mpirun

查看:237
本文介绍了OpenMPI:尝试使用mpirun时权限被拒绝错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想借助以下代码,通过MPI在不同的Google云计算实例上显示"hello world":

I would like to display "hello world" via MPI on different Google cloud compute instances with the help of the following code:

from mpi4py import MPI

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()

print("Hello, World! I am process/rank {} of {} on {}.\n".format(rank, size, name))    

.

问题是,即使我可以通过ssh连接所有这些实例而不会出现问题,当我尝试运行脚本时仍收到权限被拒绝的错误消息.我使用以下命令来激活我的脚本:

The problem is, that even so I can ssh-connect across all of these instances without problem, I get a permission denied error message when I try to run my script. I use following command to envoke my script:

mpirun --host localhost,instance_1,instance_2 python hello_world.py

mpirun --host localhost,instance_1,instance_2 python hello_world.py

.

并收到以下错误消息:

Permission denied (publickey).
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------

.

其他信息:

  • 我在所有节点上都安装了open-MPI
  • 我让Google通过使用gcloud从每个实例登录到每个实例来自动设置所有ssh密钥
  • 实例类型:n1-standard-1
  • 实例操作系统:Linux Debian(默认)

.

感谢您的帮助:-)

.

新信息:
(感谢@ Zulan指出我应该编辑上一篇文章,而不是为新信息创建新答案)

New Information:
(thanks @ Zulan for pointing out that I should edit my previous post instead of creating a new answer for new information)

因此,我尝试使用mpich而不是openmpi进行相同的操作.但是,我遇到了类似的错误消息.

So, I tried to do the same with mpich instead of openmpi. However, I run into a similar error message.

命令:

mpirun --host localhost,instance_1,instance_2 python hello_world.py

mpirun --host localhost,instance_1,instance_2 python hello_world.py

.

错误消息:

Host key verification failed.

.

我可以在两个实例之间进行ssh连接而不会出现问题,并且可以通过gcloud命令自动正确设置ssh-keys.

I can ssh-connect between my two instances without problems, and through the gcloud commands the ssh-keys should automatically be set up properly.

那么,有人知道可能是什么问题吗?我还检查了路径,防火墙规则以及在临时文件夹中编写启动脚本的能力.有人可以尝试重现此问题吗? +我应该向Google提出这个问题吗? (以前从未做过这样的事情,我不太确定:S)

So, has somebody an idea what the problem could be? I also checked the path, the firewall rules, and my ability to write startup scripts in the temp-folder. Can someone please try to recreate this problem? + Should I raise this question to Google? (never done such thing before, Im quite unsure :S)

感谢您的帮助:)

推荐答案

,所以我终于找到了解决方案.哇,问题让我发疯.

so I finally found a solution. Wow, problem was driving me nuts.

因此,事实证明,我需要手动生成ssh-key才能使脚本正常工作.我不知道为什么,因为google-services已经通过使用设置了密钥 gcloud compute ssh,但效果很好:)

So it turned out, that I needed to generate ssh-keys manually for the script to work. I have no idea why, because google-services already set up the keys by using gcloud compute ssh , but well, it worked :)

我执行的步骤:

instance_1 $ ssh-keygen -t rsa
instance_1 $ cd .ssh
instance_1 $ cat id_rsa.pub >> authorized_keys
instance_1 $ gcloud compute copy-files id_rsa.pub 
instance_1 $ gcloud compute ssh instance_2

instance_2 $ cd .ssh
instance_2 $ cat id_rsa.pub >> authorized_keys

.

我将打开另一个主题,并询问为什么即使gcloud compute ssh instance_2正在工作,我也不能使用ssh instance_2.请参阅:命令之间的差异和"ssh"

I will open another topic and ask why I cannot use ssh instance_2, even so gcloud compute ssh instance_2 is working. See: Difference between the commands "gcloud compute ssh" and "ssh"

这篇关于OpenMPI:尝试使用mpirun时权限被拒绝错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆