从FTP服务器到Flask服务器的Python流下载 [英] Python stream from FTP server to Flask server for downloading
问题描述
我有一个Python Flask应用程序,该应用程序请求从远程FTP服务器下载文件.我已经使用BytesIO
来保存使用retrbinary
从FTP服务器下载的文件的内容:
I have a Python Flask app that gets request to download a file from a remote FTP server. I have used BytesIO
to save contents of the file downloaded from FTP server using retrbinary
:
import os
from flask import Flask, request, send_file
from ftplib import FTP
from io import BytesIO
app = Flask(__name__)
@app.route('/')
def hello_world():
return 'Hello, World!'
@app.route('/download_content', methods=['GET'])
def download_content():
filepath = request.args.get("filepath").strip()
f = FTP(my_server)
f.login(my_username, my_password)
b = BytesIO()
f.retrbinary("RETR " + filepath, b.write)
b.seek(0)
return send_file(b, attachment_filename=os.path.basename(filepath))
app.run("localhost", port=8080)
这里的问题是,当点击download_content
路由时,文件的内容首先进入BytesIO
对象,然后将其发送到前端进行下载.
The issue here is that when the download_content
route is hit, first the contents of the file comes in the BytesIO
object, then it is sent to the frontend for downloading.
从FTP服务器下载文件时,如何将文件流式传输到前端?我迫不及待想要将文件完全下载到BytesIO
对象中,然后再执行send_file
,因为这既可能导致内存效率低下又需要更多时间.
How can I stream the file to frontend while it is being downloading from FTP server? I can't wait for the file to get downloaded entirely in BytesIO
object and then do a send_file
, as that could be both, memory inefficient as well as more time consuming.
我已经读过Flask的send_file
接受一个generator
对象,但是如何将BytesIO
对象yield
变成send_file
呢?
I have read that Flask's send_file
accepts a generator
object, but how can I make the BytesIO
object yield
to send_file
in chunks?
推荐答案
您似乎需要设置一个工作线程来管理从retrbinary
It looks like you will need to setup a worker thread to manage the downloading from retrbinary
由于遇到了同样的问题,我为此做了一个快速总结.这种方法似乎有效.
I have made a quick Gist for this as we have come across the same problem. This method seems to work.
https://gist.github.com/Richard-Mathie/ffecf414553f8ca4c56eb5b06e791b6f
class FTPDownloader(object):
def __init__(self, host, user, password, timeout=0.01):
self.ftp = FTP(host)
self.ftp.login(user, password)
self.timeout = timeout
def getBytes(self, filename):
print("getBytes")
self.ftp.retrbinary("RETR {}".format(filename) , self.bytes.put)
self.bytes.join() # wait for all blocks in the queue to be processed
self.finished.set() # mark streaming as finished
def sendBytes(self):
while not self.finished.is_set():
try:
yield self.bytes.get(timeout=self.timeout)
self.bytes.task_done()
except Empty:
self.finished.wait(self.timeout)
self.worker.join()
def download(self, filename):
self.bytes = Queue()
self.finished = Event()
self.worker = Thread(target=self.getBytes, args=(filename,))
self.worker.start()
return self.sendBytes()
可能应该添加一些超时和逻辑来处理连接超时等问题,但这是基本形式.
Should probably add some timeouts and logic to handle connections timing out ect, but this is the basic form.
队列不能保证工作者进程getBytes
在它们为空时已经完成,因此您必须有一个信号灯/事件来指示生成器sendBytes
工作者何时完成.但是,我必须等待队列中的所有块都首先被处理,因此self.bytes.join()
在设置完成之前.
Queues don't guarantee that the worker process getBytes
has finished when they are empty so you have to have a semaphore/Event to indicate to the generator sendBytes
when the worker has finished. However I have to wait for all the blocks in the queue to be processed first hence the self.bytes.join()
before setting finished.
如果有人能想到一种更优雅的方式,会产生很大的兴趣.
Interested if anyone can think of more elegant way of doing this.
这篇关于从FTP服务器到Flask服务器的Python流下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!