扭曲的,FTP和“流式传输”大文件 [英] Twisted, FTP, and "streaming" large files

查看:222
本文介绍了扭曲的,FTP和“流式传输”大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图实现可以最好地描述为HTTP API的FTP接口。从本质上讲,现有的REST API可以用来管理站点的用户文件,而且我正在构建一个中介服务器,将该API重新公开为FTP服务器。所以你可以用Filezilla进行登录,列出你的文件,上传新文件,删除旧文件等等。



我试图用(ftp)服务器的twisted.protocols.ftp ,(HTTP)客户端的 twisted.web.client



我遇到的情况是,当用户尝试下载文件时,将来自HTTP响应的文件流式传输到我的FTP响应中。类似于上传。



最直接的方法是从HTTP服务器下载整个文件,然后转向并将内容发送给用户。这样做的问题是,任何给定的文件可能有很多千兆字节(考虑驱动器映像,ISO文件等)。但是,使用这种方法,文件的内容将在我从API下载它的时间到将它发送给用户的时间内保存在内存中 - 不是很好。



因此,我的解决方案是尝试流它 - 因为我从API的HTTP响应中获取大量数据,所以我只想转向并将这些块发送给FTP用户。对于我的自定义FTP功能,我使用 ftp.FTPShell 的子类。 / code>。这种读取方法 openForReading 返回一个Deferred,它的实现是 IReadFile



下面是我的(初始的,简单的)流HTTP的实现。我使用 fetch 函数来设置一个HTTP请求,并且我传入的回调被从响应中获取的每个块调用。



我认为我可以使用某种双端缓冲对象来传输HTTP和FTP之间的块,通过使用缓冲对象作为 ftp._FileReader ,但是这很快就证明不起作用,因为来自 send 调用的使用者几乎立即关闭缓冲区(因为它返回一个空字符串,因为还没有数据要读取,等等)。因此,在我开始接收HTTP响应块之前,我正在发送空文件。



我关闭了,但是缺少了什么吗?我完全走错路了吗?是我想做的真的不可能(我非常怀疑)?

  from twisted.web导入客户端
导入urlparse

类HTTPStreamer(client.HTTPPageGetter):
def __init __(self):
self.callbacks = []

def addHandleResponsePartCallback(self,callback):
self.callbacks.append(回调函数)

def handleResponsePart(self,data):
for self.callbacks:
cb(data)
client.HTTPPageGetter.handleResponsePart(self,data)

class HTTPStreamerFactory(client.HTTPClientFactory):
protocol = HTTPStreamer

def __init __(self,* args,** kwargs):
client.HTTPClientFactory .__ init __(self,* args,** kwargs)
self.callbacks = []

def addChunkCallback(self,callback):
self.callbacks.append(回调)
$ b $ def buildProtocol(self,addr):
p = client.HTTPClientFactory.bui ldProtocol(self,addr)
for cb in self.callbacks:
p.addHandleResponsePartCallback(cb)
return p

def fetch(url,callback):

parsed = urlparse.urlsplit(url)

f = HTTPStreamerFactory(parsed.path)
f.addChunkCallback(回调)

from twisted .internet import reactor
reactor.connectTCP(parsed.hostname,parsed.port或80,f)



作为一个方面说明,这只是我第二天与扭曲 - 我花了大部分时间阅读了戴夫佩蒂科拉斯的 Twisted Introduction ,这是一个很好的起点,即使是基于旧版本的扭曲。

也就是说,我可能会做错事。

解决方案


我想我可以使用某种两端缓冲区对象在HTTP和FT之间传输块P,通过使用缓冲对象作为ftp._FileReader所需的类文件对象,但这很快证明不起作用,因为发送调用的使用者几乎立即关闭缓冲区(因为它返回一个空字符串,因为没有数据读取等)。因此,我甚至在我开始接收HTTP响应块之前发送空文件。

不使用ftp._FileReader,你想要一个能够在块从$ HTTPStreamer 到达它所提供的回调时写入的东西。你永远不需要/想从HTTP上的缓冲区读取数据,因为没有理由甚至没有这样的缓冲区。一旦HTTP字节到达,将它们写入消费者。类似于...

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b def __init __(self,url):
self.url = url
$ b $ def send(self,consumer):
fetch(url,consumer.write)
#你还需要一个Deferred来返回这里,所以
#FTP实现知道你什么时候完成。
return someDeferred

您可能还想使用Twisted的生产者/消费者界面来允许转移如果与HTTP服务器的连接速度比用户的FTP连接速度快,则可能需要进行限制。


I'm attempting to implement what can best be described as "an FTP interface to an HTTP API". Essentially, there is an existing REST API that can be used to manage a user's files for a site, and I'm building a mediator server that re-exposes this API as an FTP server. So you can login with, say, Filezilla and list your files, upload new ones, delete old ones, etc.

I'm attempting this with twisted.protocols.ftp for the (FTP) server, and twisted.web.client for the (HTTP) client.

The thing I'm running up against is, when a user tries to download a file, "streaming" that file from an HTTP response to my FTP response. Similar for uploading.

The most straightforward approach would be to download the entire file from the HTTP server, then turn around and send the contents to the user. The problem with this is that any given file could be many gigabytes large (think drive images, ISO files, etc). With this approach, though, the contents of the file would be held in memory between the time I download it from the API and the time I send it to the user - not good.

So my solution is to try to "stream" it - as I get chunks of data from the API's HTTP response, I just want to turn around and send those chunks along to the FTP user. Seems straightforward.

For my "custom FTP functionality", I'm using a subclass of ftp.FTPShell. The reading method of this, openForReading, returns a Deferred that fires with an implementation of IReadFile.

Below is my (initial, simple) implementation for "streaming HTTP". I use the fetch function to setup an HTTP request, and the callback I pass in gets called with each chunk I get from the response.

I thought I could use some sort of two-ended buffer object to transport the chunks between the HTTP and FTP, by using the buffer object as the file-like object required by ftp._FileReader, but that's quickly proving not to work, as the consumer from the send call almost immediately closes the buffer (because it's returning an empty string, because there's no data to read yet, etc). Thus, I'm "sending" empty files before I even start receiving the HTTP response chunks.

Am I close, but missing something? Am I on the wrong path altogether? Is what I want to do really impossible (I highly doubt that)?

from twisted.web import client
import urlparse

class HTTPStreamer(client.HTTPPageGetter):
    def __init__(self):
        self.callbacks = []

    def addHandleResponsePartCallback(self, callback):
        self.callbacks.append(callback)

    def handleResponsePart(self, data):
        for cb in self.callbacks:
            cb(data)
        client.HTTPPageGetter.handleResponsePart(self, data)

class HTTPStreamerFactory(client.HTTPClientFactory):
    protocol = HTTPStreamer

    def __init__(self, *args, **kwargs):
        client.HTTPClientFactory.__init__(self, *args, **kwargs)
        self.callbacks = []

    def addChunkCallback(self, callback):
        self.callbacks.append(callback)

    def buildProtocol(self, addr):
        p = client.HTTPClientFactory.buildProtocol(self, addr)
        for cb in self.callbacks:
            p.addHandleResponsePartCallback(cb)
        return p

def fetch(url, callback):

    parsed = urlparse.urlsplit(url)

    f = HTTPStreamerFactory(parsed.path)
    f.addChunkCallback(callback)

    from twisted.internet import reactor
    reactor.connectTCP(parsed.hostname, parsed.port or 80, f)

As a side note, this is only my second day with Twisted - I spent most of yesterday reading through Dave Peticolas' Twisted Introduction, which has been a great starting point, even if based on an older version of twisted.

That said, I may be doing things wrong.

解决方案

I thought I could use some sort of two-ended buffer object to transport the chunks between the HTTP and FTP, by using the buffer object as the file-like object required by ftp._FileReader, but that's quickly proving not to work, as the consumer from the send call almost immediately closes the buffer (because it's returning an empty string, because there's no data to read yet, etc). Thus, I'm "sending" empty files before I even start receiving the HTTP response chunks.

Instead of using ftp._FileReader, you want something that will do a write whenever a chunk arrives from your HTTPStreamer to a callback it supplies. You never need/want to do a read from a buffer on the HTTP, because there's no reason to even have such a buffer. As soon as HTTP bytes arrive, write them to the consumer. Something like...

class FTPStreamer(object):
    implements(IReadFile)

    def __init__(self, url):
        self.url = url

    def send(self, consumer):
        fetch(url, consumer.write)
        # You also need a Deferred to return here, so the 
        # FTP implementation knows when you're done.
        return someDeferred

You may also want to use Twisted's producer/consumer interface to allow the transfer to be throttled, as may be necessary if your connection to the HTTP server is faster than your user's FTP connection to you.

这篇关于扭曲的,FTP和“流式传输”大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆