Python FTP“块”迭代器(无需将整个文件加载到内存中) [英] Python FTP "chunk" iterator (without loading entire file into memory)
问题描述
关于检索FTP文件并将其写入流(如字符串缓冲区或文件,然后可以迭代)的堆栈溢出有几个答案。
例如:从FTP python中读取缓冲区中的文件然而,这些解决方案涉及在开始处理内容之前将整个文件加载到内存或将其下载到磁盘。
我没有足够的内存来缓冲整个文件,也无法访问磁盘。这可以通过处理回调函数中的数据来完成,但是我想知道是否可以用一些返回迭代器的魔术来包装ftp代码,而不是用回调来代替我的代码。 p>
IE而不是:
def get_ftp_data(handle_chunk):
...
ftp.login('uesr ','password')#需要认证
ftp.retrbinary('RETR etc',handle_chunk)
...
get_ftp_data(do_stuff_to_chunk)
我想要:
用于get_ftp_data()中的块:
do_stuff_to_chunk(块)
并且(与现有的答案不同)我想在不迭代写入整个ftp文件到磁盘或内存之前完成它。 解决方案
您必须将重新调用
调用放入另一个线程,并将回调提要块放入迭代器中:
导入线程,队列
def ftp_chunk_iterator(FTP,命令):
#设置maxsize以限制一次保存在内存中的块数。
queue = Queue.Queue(maxsize = some_appropriate_size)
def ftp_thread_target():
FTP.retrbinary(command,callback = queue.put)
queue.put (无)
ftp_thread = threading.Thread(target = ftp_thread_target)
ftp_thread.start()
而True:
chunk = queue.get ()
如果块不是无:
产生块
else:
返回
如果你不能使用线程,你可以做的最好的事情就是把你的回调写成协程:
<$ c
def process_chunks():
而$ True:
try:
chunk = yield
除了GeneratorExit :
finish_up()
返回
else:
do_whatever_with(chunk)
$ b with close(process_chunks())as coroutine:
#获取协同程序到第一个收益
coroutine.next()
FTP.retrbinary(command,callback = coroutine.send)
#coroutine.close()#退出块调用
There are several answers on stack overflow about retrieving a FTP file and writing it to a stream such as a string buffer or a file which can then be iterated on.
Such as: Read a file in buffer from FTP python
However, these solutions involve loading the entire file into memory or downloading it to the disk before beginning to process the contents.
I do not have enough memory to buffer the whole file and I do not have access to the disk. This can be done by processing the data in the callback function, but I want to know if it's possible to wrap the ftp code in some magic that returns an iterator rather than peppering my code with callbacks.
I.E. rather than:
def get_ftp_data(handle_chunk):
...
ftp.login('uesr', 'password') # authentication required
ftp.retrbinary('RETR etc', handle_chunk)
...
get_ftp_data(do_stuff_to_chunk)
I want:
for chunk in get_ftp_data():
do_stuff_to_chunk(chunk)
And (unlike existing answers) I want to do it without writing the entire ftp file to disk or memory before iterating on it.
You'll have to put the retrbinary
call in another thread and have the callback feed blocks to an iterator:
import threading, Queue
def ftp_chunk_iterator(FTP, command):
# Set maxsize to limit the number of chunks kept in memory at once.
queue = Queue.Queue(maxsize=some_appropriate_size)
def ftp_thread_target():
FTP.retrbinary(command, callback=queue.put)
queue.put(None)
ftp_thread = threading.Thread(target=ftp_thread_target)
ftp_thread.start()
while True:
chunk = queue.get()
if chunk is not None:
yield chunk
else:
return
If you can't use threads, the best you can do is writing your callback as a coroutine:
from contextlib import closing
def process_chunks():
while True:
try:
chunk = yield
except GeneratorExit:
finish_up()
return
else:
do_whatever_with(chunk)
with closing(process_chunks()) as coroutine:
# Get the coroutine to the first yield
coroutine.next()
FTP.retrbinary(command, callback=coroutine.send)
# coroutine.close() # called by exiting the block
这篇关于Python FTP“块”迭代器(无需将整个文件加载到内存中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!