HTTP下载非常大的文件 [英] HTTP Download very Big File

查看:273
本文介绍了HTTP下载非常大的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python / Twisted的Web应用程序。



我希望用户能够下载一个非常大的文件(> 100 Mb)。当然,我不想将所有的文件加载到服务器的内存中。



服务器端我有这个想法:

  ... 
request.setHeader('Content-Type','text / plain')
fp = open(fileName, 'rb')
try:
r = None
while r!='':
r = fp.read(1024)
request.write(r)
finally:
fp.close()
request.finish()

我预计这可以工作,但我有问题:
我正在用FF测试...似乎浏览器让我等到文件完成下载,然后我有打开/保存对话框。



我希望对话框立即,然后进度条在行动...



也许我有在Http头文件中添加一些东西...有什么像文件的大小?

解决方案

示例代码的两个大问题你发布的是它是非合作的,它加载发送完整文件到内存中。

  while r!='':
r = fp.read(1024 )
request.write(r)

记住,Twisted使用协作多任务来实现任何排序的并发性。所以这个代码段的第一个问题是它是一个while循环遍历整个文件的内容(你说的很大)。这意味着整个文件将被读入内存并写入响应之前,任何可能会在此过程中发生。在这种情况下,发生任何东西都包括将内存缓冲区中的字节推送到网络上,因此您的代码也将一次将整个文件保存在内存中,只能开始获取因此,作为一般规则,您不应该编写代码,以便在使用循环的基于Twisted的应用程序中使用此代码。做大事相反,您需要以与事件循环配合的方式来完成大部分工作。为了通过网络发送文件,最好的方式是使用生产者消费者。这些是两个相关的API,用于使用缓冲空事件来移动大量数据,以便有效地执行此操作,而不会浪费不合理的内存量。



您可以找到一些文档这些API在这里:



http://twistedmatrix.com/projects/core/documentation/howto/producers.html



幸运的是,对于这个很常见的情况,还有一个生产者已经写过,可以使用,而不是实现自己的:



http://twistedmatrix.com/documents/current/api/twisted.protocols.basic.FileSender.html



你可能想使用它,就像这样:

  from twisted.protocols.basic import FileSender 
from tw.python.log import err
from twi sted.web.server import NOT_DONE_YET

class Something(Resource):
...

def render_GET(self,request):
request。 setHeader('Content-Type','text / plain')
fp = open(fileName,'rb')
d = FileSender()beginFileTransfer(fp,request)
def cbFinished忽略)
fp.close()
request.finish()
d.addErrback(err).addCallback(cbFinished)
返回NOT_DONE_YET

您可以阅读更多关于 NOT_DONE_YET 和其他相关想法Twisted Web in 60 Seconds系列在我的博客上, http://jcalderone.livejournal.com/50562.html (请参阅异步响应条目)。


I'm working at a web application in Python/Twisted.

I want the user to be able to download a very big file (> 100 Mb). I don't want to load all the file in memory (of the server), of course.

server side I have this idea:

...
request.setHeader('Content-Type', 'text/plain')
fp = open(fileName, 'rb')
try:
    r = None
    while r != '':
        r = fp.read(1024)
        request.write(r)
finally:
    fp.close()
    request.finish()

I expected this to work, but I have problems: I'm testing with FF... It seems the browser make me wait until the file is completed downloaded, and then I have the open/save dialog box.

I expected the dialog box immediately, and then the progress bar in action...

Maybe I have to add something in the Http header... Something like the size of the file?

解决方案

Two big problems with the sample code you posted are that it is non-cooperative and it loads the entire file into memory before sending it.

while r != '':
    r = fp.read(1024)
    request.write(r)

Remember that Twisted uses cooperative multitasking to achieve any sort of concurrency. So the first problem with this snippet is that it is a while loop over the contents of an entire file (which you say is large). This means the entire file will be read into memory and written to the response before anything else can happen in the process. In this case, it happens that "anything" also includes pushing the bytes from the in-memory buffer onto the network, so your code will also hold the entire file in memory at once and only start to get rid of it when this loop completes.

So, as a general rule, you shouldn't write code for use in a Twisted-based application that uses a loop like this to do a big job. Instead, you need to do each small piece of the big job in a way that will cooperate with the event loop. For sending a file over the network, the best way to approach this is with producers and consumers. These are two related APIs for moving large amounts of data around using buffer-empty events to do it efficiently and without wasting unreasonable amounts of memory.

You can find some documentation of these APIs here:

http://twistedmatrix.com/projects/core/documentation/howto/producers.html

Fortunately, for this very common case, there is also a producer written already that you can use, rather than implementing your own:

http://twistedmatrix.com/documents/current/api/twisted.protocols.basic.FileSender.html

You probably want to use it sort of like this:

from twisted.protocols.basic import FileSender
from twisted.python.log import err
from twisted.web.server import NOT_DONE_YET

class Something(Resource):
    ...

    def render_GET(self, request):
        request.setHeader('Content-Type', 'text/plain')
        fp = open(fileName, 'rb')
        d = FileSender().beginFileTransfer(fp, request)
        def cbFinished(ignored):
            fp.close()
            request.finish()
        d.addErrback(err).addCallback(cbFinished)
        return NOT_DONE_YET

You can read more about NOT_DONE_YET and other related ideas the "Twisted Web in 60 Seconds" series on my blog, http://jcalderone.livejournal.com/50562.html (see the "asynchronous responses" entries in particular).

这篇关于HTTP下载非常大的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆