如何使用Paramiko getfo从SFTP服务器下载文件到内存进行处理 [英] How to use Paramiko getfo to download file from SFTP server to memory to process it

查看:44
本文介绍了如何使用Paramiko getfo从SFTP服务器下载文件到内存进行处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Paramiko 从 SFTP 下载 CSV 文件(内存中)并将其导入到 Pandas 数据帧中.

I am trying to download a CSV file (in-memory) from SFTP using Paramiko and import it into a pandas dataframe.

transport = paramiko.Transport((server, 22))
transport.connect(username=username, password=password)
sftp = paramiko.SFTPClient.from_transport(transport)

with open(file_name, 'wb') as fl:
    sftp.getfo(file_name, fl, callback=printTotals)
    df = pd.read_csv(fl, sep=' ')

下面的代码失败了,告诉我:

The code below fails, telling me:

OSError:文件没有打开读取

OSError: File is not open for reading

我假设我需要某种缓冲区或文件,例如 fl 的对象,因为 open 需要一个文件.我对这一切都比较陌生,所以如果有人能提供帮助,我会很高兴.

I assume that I need some kind of buffer or file like object for fl instead, since open needs a file. I am relatively new to all of this, so I would be happy it if someone could help.

推荐答案

一个仍然允许您使用进度回调的简单解决方案是:

A simple solution that still allows you to use progress callback is:

  • Use BytesIO file-like object to store a downloaded file to memory;

您必须在下载文件后,在开始阅读之前寻找文件指针回到文件开始处.

You have to seek file pointer back to file start after downloading it, before you start reading it.

with io.BytesIO() as fl:
    sftp.getfo(file_name, fl, callback=printTotals)
    fl.seek(0)
    df = pd.read_csv(fl, sep=' ')

尽管使用此解决方案,您最终会将文件两次加载到内存中.

Though with this solution, you will end up having the file loaded to memory twice.

更好的解决方案是实现一个自定义的类文件对象.它甚至可以让您同时下载和解析文件.

Better solution is to implement a custom file-like object. It will even allow you to download and parse the file at the same time.

class FileWithProgress:

    def __init__(self, fl):
        self.fl = fl
        self.size = fl.stat().st_size
        self.p = 0

    def read(self, blocksize):
        r = self.fl.read(blocksize)
        self.p += len(r)
        print(str(self.p) + " of " + str(self.size)) 
        return r

然后像这样使用它:

with sftp.open(file_name, "rb") as fl:
    fl.prefetch()
    df = pd.read_csv(FileWithProgress(fl), sep=' ') 

SFTPFile.prefetch 调用,参考:
读取使用 Python Paramiko SFTPClient.open 方法打开的文件很慢
.

For the SFTPFile.prefetch call, refer to:
Reading file opened with Python Paramiko SFTPClient.open method is slow
.

如果你不需要进度监控,像这样的简单代码就可以了:

If you do not need the progress monitoring, simple code like this will do:

with sftp.open(file_name, "rb") as fl:
    fl.prefetch()
    df = pd.read_csv(fl, sep=' ') 

这篇关于如何使用Paramiko getfo从SFTP服务器下载文件到内存进行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆