从Django服务器一次流式传输多个文件 [英] Stream multiple files at once from a Django server

查看:93
本文介绍了从Django服务器一次流式传输多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行Django服务器,以从受保护的网络中的另一台服务器提供文件.当用户发出一次访问多个文件的请求时,我希望Django服务器将所有这些文件一次流式传输给该用户.

I'm running a Django server to serve files from another server in a protected network. When a user makes a request to access multiple files at once I'd like my Django server to stream these files all at once to that user.

由于在浏览器中一次下载多个文件并不容易,因此需要以某种方式捆绑这些文件.我不希望服务器先下载所有文件,然后再提供现成的捆绑文件,因为这样会增加较大文件的时间损耗.对于拉链,我的理解是在组装时不能将其流式传输.

As downloading multiple files at once is not easily possible in a browser the files need to be bundled somehow. I don't want my server having to download all files first and then serve a ready bundled file, because that adds a lot of timeloss for larger files. With zips my understanding is that it can not be streamed while being assembled.

一旦远程服务器的第一个字节可用,是否有任何方法可以开始流传输容器?

Is there any way to start streaming a container as soon as the first bytes from the remote server are available?

推荐答案

Tar文件用于将多个文件收集到一个存档中.它们是为磁带录音机开发的,因此可以提供顺序的写入和读取.

Tar-files are made to collect multiple files into one archive. They have been developed for tape recorders, so therefore offer sequential writes and reads.

使用Django,可以使用

With Django it is possible to stream files to a Browser with FileResponse(), which can take a generator as argument.

如果我们向它提供一个生成器,该生成器将tar文件与用户请求的数据组合在一起,那么tar文件将及时生成.但是内置的pythons tarfile -模块不会开箱即用,无法提供这种功能.

If we feed it a generator that assembles the tar-file with the data the user requested, the tar file will be generated just in time. However pythons built-in tarfile-module doesn't offer such capability out of the box.

但是,我们可以利用 tarfile 的功能来传递类似File的对象,以自行处理档案的汇编.因此,我们可以创建一个 BytesIO() 对象该tarfile将被增量写入,并将其内容刷新到Django的 FileResponse()方法.为此,我们需要实现 FileResponse() tarfile 希望访问的一些方法.让我们创建一个类 FileStream :

We can however make use of tarfile's capability to pass in a File-like object to handle the assembly of the archive ourselves. We could therefore create a BytesIO() object that the tarfile will incrementally get written to and flush its contents to Django's FileResponse() method. For this to work we need to implement a few methods that FileResponse() and tarfile expects access to. Let's create a class FileStream:

class FileStream:
    def __init__(self):
        self.buffer = BytesIO()
        self.offset = 0

    def write(self, s):
        self.buffer.write(s)
        self.offset += len(s)

    def tell(self):
        return self.offset

    def close(self):
        self.buffer.close()

    def pop(self):
        s = self.buffer.getvalue()
        self.buffer.close()
        self.buffer = BytesIO()
        return s

现在,当我们将数据 write()写入 FileStream 的缓冲区和 yield FileStream.pop()时,Django会将数据立即发送到用户.

Now when we write() data to FileStream's buffer and yield FileStream.pop() Django will send that data immediately to the user.

作为数据,我们现在要汇编该tar文件.在 FileStream 类中,我们添加了另一个方法:

As data we now want to assemble that tar-file. In the FileStream class we add another method:

    @classmethod
    def yield_tar(cls, file_data_iterable):
        stream = FileStream()
        tar = tarfile.TarFile.open(mode='w|', fileobj=stream, bufsize=tarfile.BLOCKSIZE)

这将在内存中创建一个 FileStream 实例和一个文件句柄.文件句柄访问 FileStream 实例以读取和写入数据,而不是磁盘上的文件.

This creates a FileStream-instance and a file-handle in memory. The file-handle accesses the FileStream-instance to read and write data, instead of a file on disk.

现在,在tar文件中,我们首先必须添加一个 tarfile.TarInfo()对象,该对象表示顺序写入的数据的标头,并提供诸如文件名,大小和修改时间之类的信息.

Now in the tar-file we first have to add a tarfile.TarInfo() object that represents a header for the sequentially written data, with information like file name, size and time of modification.

        for file_name, file_size, file_date, file_data in file_data_iterable:
            tar_info = tarfile.TarInfo(file_name)
            tar_info.size = int(file_size)
            tar_info.mtime = file_date
            tar.addfile(tar_info)
            yield stream.pop()

您还可以看到将任何数据传递给该方法的结构.file_data_iterable是包含
的元组列表(((str)file_name,(int/str)file_size,(str)unix_timestamp,(bytes)file_data).

You can also see the structure to pass in any data to that method. file_data_iterable is a list of tuples containing
((str) file_name, (int/str) file_size, (str) unix_timestamp, (bytes) file_data).

发送TarInfo后,将在file_data上进行迭代.此数据需要可迭代.例如,您可以使用通过 requests.get(url,stream = True) requests.response 对象>.

When the TarInfo has been sent iterate over the file_data. This data needs to be iterable. E.g you could use a requests.response object that you retrieve with requests.get(url, stream=True).

            for chunk in (requests.get(url, stream=True).iter_content(chunk_size=cls.RECORDSIZE)):
                # you can freely choose that chunk size, but this gives me good performance
                tar.fileobj.write(chunk)
                yield stream.pop()

注意:在这里,我使用变量 url 来请求文件.您将需要在元组参数中传递它而不是 file_data .如果您选择传递可迭代的文件,则需要更新此行.

Note: Here I used the variable url to request a file. You will need to pass it instead of file_data within the tuple arguments. If you choose to pass in an iterable file you will need to update this line.

最后,tar文件需要特殊格式来指示文件已完成.Tarfile由块和记录组成.通常,一个块包含512个字节,一条记录包含20个块(20 * 512字节= 10240字节).首先,包含最后一块文件数据的最后一块被NUL(通常为纯零)填充,然后开始下一个文件的下一个TarInfo标头.

Finally the tarfile requires a special format to indicate the file has finished. Tarfiles consist out of blocks and records. Usually a block contains 512 bytes, and a record contains 20 blocks (20*512 bytes = 10240 bytes). Firstly the last block containing the last chunk of file data is filled up with NULs (usually plain zeros), then the next TarInfo header of the next file begins.

要结束归档,当前记录将被NUL填充,但是必须至少有两个块完全被NUL填充.这将由 tar.close()处理.另请参见 Wiki .

To end the archive the current record will be filled up with NULs, but there have to be at least two blocks completely filled with NULs. This will get taken care of by tar.close(). Also see this Wiki.

            blocks, remainder = divmod(tar_info.size, tarfile.BLOCKSIZE)
            if remainder > 0:
                tar.fileobj.write(tarfile.NUL * (tarfile.BLOCKSIZE - remainder))
                yield stream.pop()
                blocks += 1
            tar.offset += blocks * tarfile.BLOCKSIZE
        tar.close()
        yield stream.pop()


您现在可以在Django视图中使用 FileStream 类:

from django.http import FileResponse
import FileStream

def stream_files(request, files):
    file_data_iterable = [(
        file.name,
        file.size,
        file.date.timestamp(),
        file.data
    ) for file in files]

    response = FileReponse(
        FileStream.yield_tar(file_data_iterable),
        content_type="application/x-tar"
    )
    response["Content-Disposition"] = 'attachment; filename="streamed.tar"'
    return response


如果要传递tar文件的大小,以便用户可以看到进度条,则可以提前确定未压缩的tar文件的大小.在 FileStream 类中,添加另一个方法:


If you want to pass the size of the tar file so the user can see a progress bar you can determine the size of the uncompressed tar file ahead of time. In the FileStream class add another method:

    def tarsize(cls, sizes):
        # Each file is preceeded with a 512 byte long header
        header_size = 512
        # Each file will be appended to fill up a block
        tar_sizes = [ceil((header_size + size) / tarfile.BLOCKSIZE)
                     * tarfile.BLOCKSIZE for size in sizes]
        # the end of the archive is marked by at least two consecutive
        # zero filled blocks, and the final record block is filled up with
        # zeros.
        sum_size = sum(tar_sizes)
        remainder = cls.RECORDSIZE - (sum_size % cls.RECORDSIZE)
        if remainder < 2 * tarfile.BLOCKSIZE:
            sum_size += cls.RECORDSIZE
        total_size = sum_size + remainder
        assert total_size % cls.RECORDSIZE == 0
        return total_size

并使用它来设置响应头:

and use that to set the response header:

tar_size = FileStream.tarsize([file.size for file in files])
...
response["Content-Length"] = tar_size


非常感谢 chipx86


Huge thanks to chipx86 and allista whose gists have helped me massively with this task.

这篇关于从Django服务器一次流式传输多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆