通过Multiprocessing和PySftp并行下载 [英] Parallel downloads with Multiprocessing and PySftp

查看:53
本文介绍了通过Multiprocessing和PySftp并行下载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个代码,以使用pysftp和多处理库下载相同类型的N个文件.我进行了基本的python培训,得到了一些代码并将它们组合为一个,但是我无法解决.如果有人帮助我,我将不胜感激.在vFtp.close()命令之后发生错误.在假定同时开始下载的部分.

I'm trying to create a code to download N files at the same type using pysftp and multiprocessing libs. I made a basic python training, got pieces of codes and combined them into one, but I can't get it work out. I'd appreciate if somebody helps me with that. The error occurs after the vFtp.close() command. In the part that suppose to start simultaneous downloads.

from multiprocessing import Pool
import pysftp
import os

vHost='10.11.12.13'
vLogin='admin'
vPwd='pass1234'
vFtpPath='/export/home/'

os.chdir('d:/test/')
os.getcwd()

cnopts=pysftp.CnOpts()
cnopts.hostkeys = None

vFtp=pysftp.Connection(vHost,username=vLogin,password=vPwd,cnopts=cnopts)
vFtp.cwd(vFtpPath)
vObjectList=vFtp.listdir()
vFileList=[]
vFoldList=[]

for vObject in vObjectList:
    vType=str(vFtp.lstat(vObject))[:1]
    if vType!='d': 
        vFileList.append(vObject)
    else:   
        vFoldList.append(vObject)

vFtp.close()

def fDownload(vFileAux):
    vFtpAux=pysftp.Connection(vHost,username=vLogin,password=vPwd,cnopts=cnopts)
    vFtpAux.cwd(vFtpPath)
    vFtpAux.get(vFileAux,preserve_mtime=True)
    vFtpAux.close()

if __name__ == "__main__":
    vPool=Pool(3)
    vPool.map(fDownload,vFileList)  

推荐答案

您似乎正在尝试获取文件列表,然后使用多个进程同时下载它们.

It looks like you're trying to get the list of files then download them concurrently using multiple processes.

代替手动检查文件,请尝试在连接对象上使用walktree方法:

Instead of manually examining the files, try using the walktree method on the connection object: pysftp walktree

这是我在Python 3.5中制作的一个有效示例.我只是使用本地ftp服务器和小文件,所以我模拟了下载延迟.更改max_workers参数以设置同时下载的数量.

Here is a working example I made in Python 3.5. I'm just using a local ftp server and small files, so I simulated a download delay. Change the max_workers argument to set the number of simultaneous downloads.

"""Demo using sftp to download files simultaneously."""
import pysftp
import os
from concurrent.futures import ProcessPoolExecutor
import time


def do_nothing(s):
    """
    Using this as the callback for directories and unknown items found
    using walktree.
    """
    pass


def download(file):
    """
    Simulates a 1-second download.
    """
    with pysftp.Connection(
            host='convox', username='abc', private_key='/home/abc/test') as sftp:

        time.sleep(1)
        print('Downloading {}'.format(file))
        sftp.get(file)


def get_list_of_files(remote_dir):
    """
    Walks remote directory tree and returns list of files.
    """
    with pysftp.Connection(
            host='convox', username='abc', private_key='/home/abc/test') as sftp:

        files = []

        # if this finds a file it will send the filename to the file callback
        # which in this case just appends to the 'files' list
        sftp.walktree(remote_dir, fcallback=files.append,
                      dcallback=do_nothing, ucallback=do_nothing)

    return files

if __name__ == '__main__':
    remote_dir = '/home/abc/remoteftp/'
    download_target = '/home/abc/localftp/'

    # if you don't specify a localpath in sftp.get then it just downloads to
    # the os cwd, so set it here
    os.chdir(download_target)

    files = get_list_of_files(remote_dir)
    pool = ProcessPoolExecutor(max_workers=4)
    pool.map(download, files)

edit:ProcessPoolExecutor用于在多个cpu内核上运行某些内容,并且受您的处理器的限制.对于诸如下载之类的网络任务,您可以改用线程.在上面的代码中,这只是一个更改:代替ProcessPoolExecutor导入并使用ThreadPoolExecutor.然后,您可以使用更多的max_workers.

edit: ProcessPoolExecutor is for running something on multiple cpu cores and will be limited by your processor. For network tasks like downloading you can use threads instead. In the above code this is only one change: instead of ProcessPoolExecutor import and use ThreadPoolExecutor. Then you can use more max_workers.

这篇关于通过Multiprocessing和PySftp并行下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆