从FTP python读取缓冲区中的文件 [英] Read a file in buffer from FTP python

查看:392
本文介绍了从FTP python读取缓冲区中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从FTP服务器读取文件。该文件是 .gz 文件。我想知道是否可以在套接字打开时对此文件执行操作。我试图按照在两个StackOverflow问题中提到的读取文件而不写入磁盘从FTP下载文件而无需下载,但没有成功。

I am trying to read a file from an FTP server. The file is a .gz file. I would like to know if I can perform actions on this file while the socket is open. I tried to follow what was mentioned in two StackOverflow questions on reading files without writing to disk and reading files from FTP without downloading but was not successful.

我知道如何提取下载文件的数据/工作,但我不确定是否我可以在飞行中做到这一点。有没有办法连接到网站,获取数据缓冲区,可能做一些数据提取并退出?

I know how to extract data/work on the downloaded file but I'm not sure if I can do it on the fly. Is there a way to connect to the site, get data in a buffer, possibly do some data extraction and exit?

当试图StringIO时,我得到了错误:

When trying StringIO I got the error:

>>> from ftplib import FTP
>>> from StringIO import StringIO
>>> ftp = FTP('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')

Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
ftp = FTP('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')
File "C:\Python27\lib\ftplib.py", line 117, in __init__
self.connect(host)
File "C:\Python27\lib\ftplib.py", line 132, in connect
self.sock = socket.create_connection((self.host, self.port), self.timeout)
File "C:\Python27\lib\socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno 11004] getaddrinfo failed

我只需要知道如何将数据导入某个变量并循环,直到读取到FTP文件。

I just need to know how can I get data into some variable and loop on it until the file from FTP is read.

I感谢您的时间和帮助。谢谢!

I appreciate your time and help. Thanks!

推荐答案

确保首先登录到ftp服务器。之后,使用 retrbinary 以二进制模式下拉文件。它在每个文件块上使用回调。您可以使用它将其加载到字符串中。

Make sure to login to the ftp server first. After this, use retrbinary which pulls the file in binary mode. It uses a callback on each chunk of the file. You can use this to load it into a string.

from ftplib import FTP
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous@

# Setup a cheap way to catch the data (could use StringIO too)
data = []
def handle_binary(more_data):
    data.append(more_data)

resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
data = "".join(data)

Bonus points:我们如何解压缩字符串?

Bonus points: how about we decompress the string while we're at it?

简单模式,使用上面的数据字符串

Easy mode, using data string above

import gzip
import StringIO
zippy = gzip.GzipFile(fileobj=StringIO.StringIO(data))
uncompressed_data = zippy.read()

稍微好一点,完整的解决方案

from ftplib import FTP
import gzip
import StringIO

ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous@

sio = StringIO.StringIO()
def handle_binary(more_data):
    sio.write(more_data)

resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
sio.seek(0) # Go back to the start
zippy = gzip.GzipFile(fileobj=sio)

uncompressed = zippy.read()

实际上,在即时解压缩会好得多,但我没有看到用内置库(至少不容易)做到这一点的方法。

In reality, it would be much better to decompress on the fly but I don't see a way to do that with the built in libraries (at least not easily).

这篇关于从FTP python读取缓冲区中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆