写入SFTP服务器上使用pysftp"open"打开的文件.方法很慢 [英] Writing to a file on SFTP server opened using pysftp "open" method is slow

查看:287
本文介绍了写入SFTP服务器上使用pysftp"open"打开的文件.方法很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一段有效的Python代码,但是将Dataframe直接写到SFTP位置非常慢.我正在使用 pysftp pandas.to_csv()来完成从远程位置读取Excel文件,运行一些简单的转换并将其写到SFTP位置的任务.

I have a piece of Python code that works, but is very slow to write a Dataframe directly to an SFTP location. I am using pysftp and pandas.to_csv() to achieve the task of reading an Excel file from a remote location, run a few simple transformations and write it over to an SFTP location.

在下面共享该代码段,准确地花费4分30秒将100条记录写入SFTP位置.我处理的平均数据框最多有20列.

The code snippet is shared below which, takes 4 minutes 30 seconds precisely, to write 100 records to the SFTP location. An average Dataframe that I process has a maximum of 20 columns.

def dataframe_sftp_transfer(df,destination_path):
    cnopts = CnOpts()
    cnopts.hostkeys = None
    sftp = Connection('sftp3.server.com'
                    ,username= 'user'
                    ,password = 'pwd123'
                    ,cnopts=cnopts)
    with sftp.open(destination_path,'w+') as f:
        chunksize = 100
        with tqdm(total=len(df)) as progbar:
            df.to_csv(f,sep='~',index=False,chunksize=chunksize)
            progbar.update(chunksize)

是否有更好/更快的方法来实现上述目的?编写这样大小的文件不应该只花几分钟时间吗?

Is there a better/faster way to achieve the aforesaid? Shouldn't writing files of the stated magnitude take only a couple of minutes?

使用FileZilla这样的工具将文件放置在远程SFTP位置的速度要快得多,但可悲的是,它无法实现任何形式的自动化.

Using a tool like FileZilla to put files in the remote SFTP location works much faster but, that sadly takes away any form of automation.

推荐答案

您无需缓冲即可打开远程文件.这样,每次df.to_csv写入文件时,Paramiko/pysftp都会向SFTP服务器发送请求,并等待响应.我不知道df.to_csv的内部信息,但是很可能每行写一次(如果不是更多的话).那可以解释为什么上传这么慢.特别是,如果您与服务器的连接具有较高的延迟.

You open the remote file without buffering. That way, every time the df.to_csv writes to the file, Paramiko/pysftp sends a request to the SFTP server and waits for a response. I do not know internals of df.to_csv, but it's likely it does one write per line (if not more). That would explain, why the upload is so slow. Particularly, if your connection to the server has high latency.

要启用缓冲写入,请使用bufsize参数Connection.open :

To enable buffered writes, use bufsize parameter of Connection.open:

with sftp.open(destination_path, 'w+', 32768) as f:

类似地用于读取/下载:
读取使用Python Paramiko SFTPClient.open方法打开的文件很慢

Similarly for reads/downloads:
Reading file opened with Python Paramiko SFTPClient.open method is slow

强制性警告:除非您不关心安全性,否则请不要设置cnopts.hostkeys = None.有关正确的解决方案,请参见使用pysftp验证主机密钥.

Obligatory warning: Do not set cnopts.hostkeys = None, unless you do not care about security. For the correct solution see Verify host key with pysftp.

这篇关于写入SFTP服务器上使用pysftp"open"打开的文件.方法很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆