使用 Python Paramiko 在不同的 SSH 服务器中并行运行多个命令 [英] Run multiple commands in different SSH servers in parallel using Python Paramiko

查看:76
本文介绍了使用 Python Paramiko 在不同的 SSH 服务器中并行运行多个命令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 SSH.py,目的是通过 SSH 连接到许多服务器以运行 Python 脚本 (worker.py).我正在使用 Paramiko,但我对它很陌生,并且一直在学习.在我使用 ssh 连接的每台服务器上,我需要保持 Python 脚本运行——这是为了并行训练模型,因此脚本需要在所有机器上运行以联合更新模型参数/训练.服务器上的 Python 脚本需要运行,所以要么所有 SSH 连接都无法关闭,要么我必须想办法让服务器上的 Python 脚本即使关闭连接也能继续运行.

I have an SSH.py with the goal of connecting to many servers over SSH to run a Python script (worker.py). I am using Paramiko, but am very new to it and learning as I go. On each server I ssh over with, I need to keep the Python script running -- this is for training a model parallely and so the script needs to run on all machines as to update model parameters/train jointly. The Python script on the servers need to be running so either all the SSH connections cannot close or I have to figure out a way for the Python script on the servers to keep running even if I close the connection.

从广泛的谷歌搜索来看,您似乎可以使用 nohup 或:

From extensive googling, it looks like you can achieve this with nohup or:

client = paramiko.SSHClient()
client.connect(ip_address, username, password)
transport = client.get_transport()
channel = transport.open_session()
channel.exec_command("python worker.py > /logs/'command output' 2>&1")

但是,我不清楚的是我们如何关闭/退出所有 SSH 连接?我在 cmd.exe 上运行 SSH.py 文件,关闭 cmd.exe 是否足以让所有远程进程关闭?

However, what is unclear to me is how do we close/exit all SSH connections? I am running the SSH.py file on cmd.exe, would closing the cmd.exe be enough for all processes remotely to close?

此外,我对 client.close() 的使用是否符合我的目的?请在下面查看我的代码.

In addition, is my use of client.close() correct for my purposes? Please see below what I have for my code.

# SSH.py

import paramiko
import argparse
import os

path = "path"
python_script = "worker.py"

# definitions for ssh connection and cluster
ip_list = ['XXX.XXX.XXX.XXX', XXX.XXX.XXX.XXX', XXX.XXX.XXX.XXX']
port_list = [':XXXX', ':XXXX', ':XXXX']
user_list = ['user', 'user', 'user']
password_list = ['pass', 'pass', 'pass']
node_list = list(map(lambda x: f'-node{x + 1} ', list(range(len(ip_list)))))
cluster = ' '.join([node + ip + port for node, ip, port in zip(node_list, ip_list, port_list)])

# run script on command line of local machine
os.system(f"cd {path} && python {python_script} {cluster} -type worker -index 0 -batch 64 > {path}/logs/'command output'/{ip_list[0]}.log 2>&1")

# loop for IP and password
for i, (ip, user, password) in enumerate(zip(ip_list[1:], user_list[1:], password_list[1:]), 1):
    try:
        print("Open session in: " + ip + "...")
        client = paramiko.SSHClient()
        client.connect(ip, user, password)
        transport = client.get_transport()
        channel = transport.open_session()
    except paramiko.SSHException:
        print("Connection Failed")
        quit()

    try:
        channel.exec_command(f"cd {path} && python {python_script} {cluster} -type worker -index {i} -batch 64 > {path}/logs/'command output'/{ip_list[i]}.log 2>&1", timeout=30)
        client.close() # here I am closing connection but above command should be running, my question is can I safely close cmd.exe on which I am running SSH.py? 
    except paramiko.SSHException:
        print("Cannot run file. Continue with other IPs in list...")
        client.close()
        continue

代码基于使用Python Paramiko在后台运行远程SSH服务器的过程

似乎 channel.exec_command() 没有执行命令

It seems like the channel.exec_command() is not executing the command

f"cd {path} && python {python_script} {cluster} -type worker -index {i} -batch 64 > {path}/logs/'command output'/{ip_list[i]}.log 2>&1"

所以我想知道是不是因为client.close()?如果我用 client.close() 注释掉所有行会发生什么?这会有帮助吗?这很危险吗?当我退出我的本地 Python 脚本时,这会关闭我所有的 SSH 连接,因此不需要 client.close() 吗?

So I wonder if it is because of client.close()? What would happen if I comment out all the lines with client.close()? Would this help? Is this dangerous? When I quit my local Python script, would this close all my SSH connections and hence, no need for client.close()?

而且我所有的机器都有 Windows 操作系统.

Also all my machines have Windows OS.

推荐答案

确实,问题在于您关闭了 SSH 连接.由于远程进程未与终端分离,因此关闭终端会终止该进程.在 Linux 服务器上,您可以使用 nohup.我不知道什么是(如果有)Windows 等价物.

Indeed, the problem is that you close the SSH connection. As the remote process is not detached from the terminal, closing the terminal terminates the process. On Linux servers, you can use nohup. I do not know what is (if there is) a Windows equivalent.

反正好像没必要关闭连接.我明白了,您可以等待所有命令完成.

Anyway, it seems that you do not need to close the connection. I understood, that you are ok with waiting for all the commands to complete.

stdouts = []
clients = []

# Start the commands
for i, (ip, user, password) in enumerate(zip(ip_list[1:], user_list[1:], password_list[1:]), 1):
    print("Open session in: " + ip + "...")
    client = paramiko.SSHClient()
    client.connect(ip, user, password)
    command = \
        f"cd {path} && " + \
        f"python {python_script} {cluster} -type worker -index {i} -batch 64 " + \
        f"> {path}/logs/'command output'/{ip_list[i]}.log 2>&1"
    stdin, stdout, stderr = client.exec_command(command)
    clients.append(client)
    stdouts.append(stdout)

# Wait for commands to complete
for i in range(len(stdouts)):
    stdouts[i].read()
    clients[i].close()


请注意,上述带有 stdout.read() 的简单解决方案仅在您将命令输出重定向到远程文件时才起作用.如果没有,命令可能会死锁.


Note that the above simple solution with stdout.read() is working only because you redirect the commands output to a remote file. Were you not, the commands might deadlock.

没有那个(或者如果您想在本地查看命令输出),您将需要这样的代码:

Without that (or if you want to see the command output locally) you will need a code like this:

while any(x is not None for x in stdouts):
    for i in range(len(stdouts)):
        stdout = stdouts[i]
        if stdout is not None:
            channel = stdout.channel
            # To prevent losing output at the end, first test for exit, then for output
            exited = channel.exit_status_ready()
            while channel.recv_ready():
                s = channel.recv(1024).decode('utf8')
                print(f"#{i} stdout: {s}")
            while channel.recv_stderr_ready():
                s = channel.recv_stderr(1024).decode('utf8')
                print(f"#{i} stderr: {s}")
            if exited:
                print(f"#{i} done")
                clients[i].close()
                stdouts[i] = None
    time.sleep(0.1)

如果不需要将stdout和stderr分开,使用Channel.set_combine_stderr.请参阅Paramiko ssh die/hang with big output.

If you do not need to separate the stdout and stderr, you can greatly simplify the code by using Channel.set_combine_stderr. See Paramiko ssh die/hang with big output.

关于你关于SSHClient.close的问题:如果你不调用它,当脚本完成时,当Python垃圾收集器清理挂起的对象时,连接将隐式关闭.这是一个不好的做法.并且即使 Python 不会这样做,本地操作系统也会终止本地 Python 进程的所有连接.这也是不好的做法.无论如何,这将终止远程进程.

Regarding your question about SSHClient.close: If you do not call it, the connection will be closed implicitly, when the script finishes, when Python garbage collector cleans up the pending objects. It's a bad practice. And even if Python won't do it, the local OS will terminate all connections of the local Python process. That's a bad practice too. In any case, that will terminate the remote processes along.

这篇关于使用 Python Paramiko 在不同的 SSH 服务器中并行运行多个命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆