从FTP服务器到Flask服务器的Python流下载 [英] Python stream from FTP server to Flask server for downloading

查看:294
本文介绍了从FTP服务器到Flask服务器的Python流下载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Python Flask应用程序,该应用程序请求从远程FTP服务器下载文件.我已经使用BytesIO来保存使用retrbinary从FTP服务器下载的文件的内容:

I have a Python Flask app that gets request to download a file from a remote FTP server. I have used BytesIO to save contents of the file downloaded from FTP server using retrbinary:

import os

from flask import Flask, request, send_file
from ftplib import FTP
from io import BytesIO

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

@app.route('/download_content', methods=['GET'])
def download_content():
    filepath = request.args.get("filepath").strip()
    f = FTP(my_server)
    f.login(my_username, my_password)
    b = BytesIO()
    f.retrbinary("RETR " + filepath, b.write)
    b.seek(0)
    return send_file(b, attachment_filename=os.path.basename(filepath))

app.run("localhost", port=8080)

这里的问题是,当点击download_content路由时,文件的内容首先进入BytesIO对象,然后将其发送到前端进行下载.

The issue here is that when the download_content route is hit, first the contents of the file comes in the BytesIO object, then it is sent to the frontend for downloading.

从FTP服务器下载文件时,如何将文件流式传输到前端?我迫不及待想要将文件完全下载到BytesIO对象中,然后再执行send_file,因为这既可能导致内存效率低下又需要更多时间.

How can I stream the file to frontend while it is being downloading from FTP server? I can't wait for the file to get downloaded entirely in BytesIO object and then do a send_file, as that could be both, memory inefficient as well as more time consuming.

我已经读过Flask的send_file接受一个generator对象,但是如何将BytesIO对象yield变成send_file呢?

I have read that Flask's send_file accepts a generator object, but how can I make the BytesIO object yield to send_file in chunks?

推荐答案

您似乎需要设置一个工作线程来管理从retrbinary

It looks like you will need to setup a worker thread to manage the downloading from retrbinary

由于遇到了同样的问题,我为此做了一个快速总结.这种方法似乎有效.

I have made a quick Gist for this as we have come across the same problem. This method seems to work.

https://gist.github.com/Richard-Mathie/ffecf414553f8ca4c56eb5b06e791b6f

class FTPDownloader(object):
  def __init__(self, host, user, password, timeout=0.01):
    self.ftp = FTP(host)
    self.ftp.login(user, password)
    self.timeout = timeout

  def getBytes(self, filename):
    print("getBytes")
    self.ftp.retrbinary("RETR {}".format(filename) , self.bytes.put)
    self.bytes.join()   # wait for all blocks in the queue to be processed
    self.finished.set() # mark streaming as finished

  def sendBytes(self):
    while not self.finished.is_set():
      try:
        yield self.bytes.get(timeout=self.timeout)
          self.bytes.task_done()
      except Empty:
        self.finished.wait(self.timeout)
    self.worker.join()

  def download(self, filename):
    self.bytes = Queue()
    self.finished = Event()
    self.worker = Thread(target=self.getBytes, args=(filename,))
    self.worker.start()
    return self.sendBytes()

可能应该添加一些超时和逻辑来处理连接超时等问题,但这是基本形式.

Should probably add some timeouts and logic to handle connections timing out ect, but this is the basic form.

队列不能保证工作者进程getBytes在它们为空时已经完成,因此您必须有一个信号灯/事件来指示生成器sendBytes工作者何时完成.但是,我必须等待队列中的所有块都首先被处理,因此self.bytes.join()在设置完成之前.

Queues don't guarantee that the worker process getBytes has finished when they are empty so you have to have a semaphore/Event to indicate to the generator sendBytes when the worker has finished. However I have to wait for all the blocks in the queue to be processed first hence the self.bytes.join() before setting finished.

如果有人能想到一种更优雅的方式,会产生很大的兴趣.

Interested if anyone can think of more elegant way of doing this.

这篇关于从FTP服务器到Flask服务器的Python流下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆