使用 SFTP 缓慢上传许多小文件 [英] Slow upload of many small files with SFTP

查看:215
本文介绍了使用 SFTP 缓慢上传许多小文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当用SFTP上传100个100字节的文件时,这里需要17秒(连接建立后,我什至不计算初始连接时间).这意味着仅传输 10 KB 需要 17 秒,即 0.59 KB/秒!

When uploading 100 files of 100 bytes each with SFTP, it takes 17 seconds here (after the connection is established, I don't even count the initial connection time). This means it's 17 seconds to transfer 10 KB only, i.e. 0.59 KB/sec!

我知道将 SSH 命令发送到 openwriteclose 等可能会产生很大的开销,但是,当使用 SFTP 发送许多小文件时,有没有办法加快进程?

I know that sending SSH commands to open, write, close, etc. probably creates a big overhead, but still, is there a way to speed up the process when sending many small files with SFTP?

paramiko/pysftp 中的特殊模式,将所有写入操作保存在内存缓冲区中(假设最后 2 秒的所有操作),以及然后在一组 SSH/SFTP 中完成所有操作?这样可以避免在每次操作之间等待 ping 时间.

Or a special mode in paramiko / pysftp to keep all the writes operations to do in a memory buffer (let's say all operations for the last 2 seconds), and then do everything in one grouped pass of SSH/SFTP? This would avoid to wait for the ping time between each operation.

注意:

  • 我的连接上传速度约为 100 KB/s(已测试为 0.8 Mbit 上传速度),对服务器的 ping 时间为 40 毫秒
  • 当然,如果不是发送 100 个 100 字节的文件,而是发送 1 个 10 KB 字节的文件,则需要 <1 秒
  • 我不想在远程运行二进制程序,只接受 SFTP 命令
import pysftp, time, os
with pysftp.Connection('1.2.3.4', username='root', password='') as sftp:
    with sftp.cd('/tmp/'):
        t0 = time.time()
        for i in range(100):
            print(i)
            with sftp.open('test%i.txt' % i, 'wb') as f:   # even worse in a+ append mode: it takes 25 seconds
                f.write(os.urandom(100))
        print(time.time() - t0)

推荐答案

我建议您使用来自多个线程的多个连接并行上传.这是一个简单可靠的解决方案.

I'd suggest you to parallelize the upload using multiple connections from multiple threads. That's easy and reliable solution.

如果您想通过使用缓冲请求来解决困难,您可以将解决方案基于以下简单示例.

If you want to do the hard way by using buffering the requests, you can base your solution on the following naive example.

例子:

  • 排队 100 个文件打开请求;
  • 当它读取对打开请求的响应时,它会将写入请求排入队列;
  • 当它读取对写请求的响应时,它会将关闭请求排入队列

如果我对 100 个文件执行纯 SFTPClient.put,大约需要 10-12 秒.使用下面的代码,我实现了大约 50-100 倍的速度.

If I do plain SFTPClient.put for 100 files, it takes about 10-12 seconds. Using the code below, I achieve the same about 50-100 times faster.

但是!代码真的很幼稚:

  • 它期望服务器以相同的顺序响应请求.事实上,大多数 SFTP 服务器(包括事实上的标准 OpenSSH)都以相同的顺序响应.但根据 SFTP 规范,SFTP 服务器可以以任何顺序自由响应.
  • 代码期望所有文件读取都一次性发生 - upload.localhandle.read(32*1024).仅适用于小文件.
  • 代码期望 SFTP 服务器可以处理 100 个并行请求和 100 个打开的文件.对于大多数服务器来说这不是问题,因为它们按顺序处理请求.对于普通服务器来说,打开 100 个文件应该不是问题.
  • 不过,您不能对无限数量的文件执行此操作.您必须以某种方式对文件进行排队以控制未完成请求的数量.其实这 100 个请求也太多了.
  • 该代码使用了 SFTPClient 类的非公共方法.
  • 我不会做 Python.肯定有办法更优雅地编写代码.
  • It expects that the server responds to the requests in the same order. Indeed, majority of SFTP servers (including the de-facto standard OpenSSH) respond in the same order. But according to the SFTP specification, an SFTP server is free to respond in any order.
  • The code expects that all file reads happen in one go – upload.localhandle.read(32*1024). That's true for small files only.
  • The code expects that the SFTP server can handle 100 parallel requests and 100 opened files. That's not a problem for most servers, as they process the requests in order. And 100 opened files should not be a problem for a regular server.
  • You cannot do that for unlimited number of files though. You have to queue the files somehow to keep the number of outstanding requests in check. Actually even these 100 requests is too much.
  • The code uses non-public methods of SFTPClient class.
  • I do not do Python. There are definitely ways to code this more elegantly.
import paramiko
import paramiko.sftp
from paramiko.py3compat import long
 
ssh = paramiko.SSHClient()
ssh.connect(...)
 
sftp = ssh.open_sftp()
                      
class Upload:
   def __init__(self):
       pass

uploads = []

for i in range(0, 100):
    print(f"sending open request {i}")
    upload = Upload()
    upload.i = i
    upload.localhandle = open(f"{i}.dat")
    upload.remotepath = f"/remote/path/{i}.dat"
    imode = \
        paramiko.sftp.SFTP_FLAG_CREATE | paramiko.sftp.SFTP_FLAG_TRUNC | \
        paramiko.sftp.SFTP_FLAG_WRITE
    attrblock = paramiko.SFTPAttributes()
    upload.request = \
        sftp._async_request(type(None), paramiko.sftp.CMD_OPEN, upload.remotepath, \
            imode, attrblock)
    uploads.append(upload)

for upload in uploads:
    print(f"reading open response {upload.i}");
    t, msg = sftp._read_response(upload.request)
    if t != paramiko.sftp.CMD_HANDLE:
        raise SFTPError("Expected handle")
    upload.handle = msg.get_binary()

    print(f"sending write request {upload.i} to handle {upload.handle}");
    data = upload.localhandle.read(32*1024)
    upload.request = \
        sftp._async_request(type(None), paramiko.sftp.CMD_WRITE, \
            upload.handle, long(0), data)

for upload in uploads:
    print(f"reading write response {upload.i} {upload.request}");
    t, msg = sftp._read_response(upload.request)
    if t != paramiko.sftp.CMD_STATUS:
        raise SFTPError("Expected status")
    print(f"closing {upload.i} {upload.handle}");
    upload.request = \
        sftp._async_request(type(None), paramiko.sftp.CMD_CLOSE, upload.handle)

for upload in uploads:
    print(f"reading close response {upload.i} {upload.request}");
    sftp._read_response(upload.request)

这篇关于使用 SFTP 缓慢上传许多小文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆