Python FTP“块”迭代器(无需将整个文件加载到内存中) [英] Python FTP "chunk" iterator (without loading entire file into memory)

查看:136
本文介绍了Python FTP“块”迭代器(无需将整个文件加载到内存中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于检索FTP文件并将其写入流(如字符串缓冲区或文件,然后可以迭代)的堆栈溢出有几个答案。

例如:从FTP python中读取缓冲区中的文件然而,这些解决方案涉及在开始处理内容之前将整个文件加载到内存或将其下载到磁盘。



我没有足够的内存来缓冲整个文件,也无法访问磁盘。这可以通过处理回调函数中的数据来完成,但是我想知道是否可以用一些返回迭代器的魔术来包装ftp代码,而不是用回调来代替我的代码。 p>

IE而不是:

  def get_ftp_data(handle_chunk):
...
ftp.login('uesr ','password')#需要认证
ftp.retrbinary('RETR etc',handle_chunk)
...

get_ftp_data(do_stuff_to_chunk)

我想要:

 用于get_ftp_data()中的块:
do_stuff_to_chunk(块)

并且(与现有的答案不同)我想在不迭代写入整个ftp文件到磁盘或内存之前完成它。 解决方案

您必须将重新调用调用放入另一个线程,并将回调提要块放入迭代器中:

 导入线程,队列

def ftp_chunk_iterator(FTP,命令):
#设置maxsize以限制一次保存在内存中的块数。
queue = Queue.Queue(maxsize = some_appropriate_size)

def ftp_thread_target():
FTP.retrbinary(command,callback = queue.put)
queue.put (无)

ftp_thread = threading.Thread(target = ftp_thread_target)
ftp_thread.start()

而True:
chunk = queue.get ()
如果块不是无:
产生块
else:
返回

如果你不能使用线程,你可以做的最好的事情就是把你的回调写成协程:

 <$ c 


def process_chunks():
而$ True:
try:
chunk = yield
除了GeneratorExit :
finish_up()
返回
else:
do_whatever_with(chunk)
$ b with close(process_chunks())as coroutine:

#获取协同程序到第一个收益
coroutine.next()

FTP.retrbinary(command,callback = coroutine.send)
#coroutine.close()#退出块调用


There are several answers on stack overflow about retrieving a FTP file and writing it to a stream such as a string buffer or a file which can then be iterated on.

Such as: Read a file in buffer from FTP python

However, these solutions involve loading the entire file into memory or downloading it to the disk before beginning to process the contents.

I do not have enough memory to buffer the whole file and I do not have access to the disk. This can be done by processing the data in the callback function, but I want to know if it's possible to wrap the ftp code in some magic that returns an iterator rather than peppering my code with callbacks.

I.E. rather than:

def get_ftp_data(handle_chunk):
    ...
    ftp.login('uesr', 'password') # authentication required
    ftp.retrbinary('RETR etc', handle_chunk)
    ...

get_ftp_data(do_stuff_to_chunk)

I want:

for chunk in get_ftp_data():
    do_stuff_to_chunk(chunk)

And (unlike existing answers) I want to do it without writing the entire ftp file to disk or memory before iterating on it.

解决方案

You'll have to put the retrbinary call in another thread and have the callback feed blocks to an iterator:

import threading, Queue

def ftp_chunk_iterator(FTP, command):
    # Set maxsize to limit the number of chunks kept in memory at once.
    queue = Queue.Queue(maxsize=some_appropriate_size)

    def ftp_thread_target():
        FTP.retrbinary(command, callback=queue.put)
        queue.put(None)

    ftp_thread = threading.Thread(target=ftp_thread_target)
    ftp_thread.start()

    while True:
        chunk = queue.get()
        if chunk is not None:
            yield chunk
        else:
            return

If you can't use threads, the best you can do is writing your callback as a coroutine:

from contextlib import closing


def process_chunks():
    while True:
        try:
            chunk = yield
        except GeneratorExit:
            finish_up()
            return
        else:
            do_whatever_with(chunk)

with closing(process_chunks()) as coroutine:

    # Get the coroutine to the first yield
    coroutine.next()

    FTP.retrbinary(command, callback=coroutine.send)
# coroutine.close() #  called by exiting the block

这篇关于Python FTP“块”迭代器(无需将整个文件加载到内存中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆