生成大文件并发送 [英] Generate large file and send it

查看:92
本文介绍了生成大文件并发送的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当大的.csv文件(最多100万行),我想在浏览器请求时生成并发送该文件.

I have a rather large .csv file (up to 1 million lines) that I want to generate and send when a browser requests it.

我当前拥有的代码是(除了我实际上没有生成相同的数据):

The current code I have is (except that I don't actually generate the same data):

class CSVHandler(tornado.web.RequestHandler): 
  def get(self):
    self.set_header('Content-Type','text/csv')
    self.set_header('content-Disposition','attachement; filename=dump.csv')  
    self.write('lineNumber,measure\r\n') # File header
    for line in range(0,1000000): 
      self.write(','.join([str(line),random.random()])+'\r\n') # mock data

app = tornado.web.Application([(r"/csv",csvHandler)])
app.listen(8080)

我使用上述方法遇到的问题是:

The problems I have with the method above are:

  • Web浏览器不会直接开始下载已发送的块. Web服务器似乎准备全部内容时,它挂起了.
  • Web服务器在处理此请求并使其他客户端挂起时被阻止.

推荐答案

默认情况下,所有数据都会缓存在内存中,直到请求结束,以便在发生异常时可以将其替换为错误页面.要递增发送响应,您的处理程序必须是异步的(这样它就可以与在IOLoop上写入响应和其他请求同时进行交织)并使用RequestHandler.flush()方法.

By default, all data is buffered in memory until the end of the request so that it can be replaced with an error page if an exception occurs. To send a response incrementally, your handler must be asynchronous (so it can be interleaved with both the writing of the response and other requests on the IOLoop) and use the RequestHandler.flush() method.

请注意,异步"与使用@tornado.web.asynchronous装饰器"不同;在这种情况下,我建议使用@tornado.gen.coroutine而不是@asynchronous.这使您可以在每次冲洗时简单地使用yield运算符:

Note that "being asynchronous" is not the same as "using the @tornado.web.asynchronous decorator"; in this case I recommend using @tornado.gen.coroutine instead of @asynchronous. This allows you to simply use the yield operator with every flush:

class CSVHandler(tornado.web.RequestHandler): 
    @tornado.gen.coroutine
    def get(self):
        self.set_header('Content-Type','text/csv')
        self.set_header('content-Disposition','attachment; filename=dump.csv')  
        self.write('lineNumber,measure\r\n') # File header
        for line in range(0,1000000): 
            self.write(','.join([str(line),random.random()])+'\r\n') # mock data
            yield self.flush()

self.flush()开始将数据写入网络的过程,而yield等待直到该数据到达内核.这可以让其他处理程序运行,并且还可以帮助管理内存消耗(通过限制您所能获得的客户端下载速度的领先程度).在CSV文件的每一行之后刷新都需要花费一些钱,因此您可能只想在每100或1000行之后刷新一次.

self.flush() starts the process of writing the data to the network, and yield waits until that data has reached the kernel. This lets other handlers run and also helps manage memory consumption (by limiting how far ahead of the client's download speed you can get). Flushing after every line of a CSV file is a little expensive, so you may want to only flush after every 100 or 1000 lines.

请注意,如果下载开始后发生异常,则无法向客户端显示错误页面.您只能中途中断下载.尝试验证请求,并在首次调用flush()之前做所有可能失败的事情.

Note that if there is an exception once the download has started, there is no way to show an error page to the client; you can only cut the download off partway through. Try to validate the request and do everything that is likely to fail before the first call to flush().

这篇关于生成大文件并发送的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆