我可以流文件上传到S3没有Content-Length头? [英] Can I stream a file upload to S3 without a content-length header?

查看:771
本文介绍了我可以流文件上传到S3没有Content-Length头?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在一台机器内存有限的,而且我想上传一个动态生成的(未从盘)的文件以流的方式S3。换句话说,我不知道该文件的大小,当我开始上载,但我会在年底知道这一点。通常,PUT请求具有Content-Length头,但或许有解决的办法,如使用多或分块内容类型。

I'm working on a machine with limited memory, and I'd like to upload a dynamically generated (not-from-disk) file in a streaming manner to S3. In other words, I don't know the file size when I start the upload, but I'll know it by the end. Normally a PUT request has a Content-Length header, but perhaps there is a way around this, such as using multipart or chunked content-type.

S3可以支持流媒体上传。例如,在这里看到:

S3 can support streaming uploads. For example, see here:

<一个href="http://blog.odonnell.nu/posts/streaming-uploads-s3-python-and-poster/">http://blog.odonnell.nu/posts/streaming-uploads-s3-python-and-poster/

我的问题是,我可以完成同样的事情,而无需在上传开始指定文件的长度?

My question is, can I accomplish the same thing without having to specify the file length at the start of the upload?

推荐答案

您必须通过的 S3的多部分API 。其中每一个块需要一个内容长度,但你能避免装入大量的数据(100MiB +)到内存中。

You have to upload your file in 5MiB+ chunks via S3's multipart API. Each of those chunks requires a Content-Length but you can avoid loading huge amounts of data (100MiB+) into memory.

  • 启动S3的多部分上传
  • 在收集数据到缓冲区中,直到缓冲区达到S3较低的块大小的限制(5MiB)。生成MD5校验,同时建立缓冲区。
  • 在上载缓存为的部分的,存储的ETag(读上一个文档)。
  • 一旦你的数据达到EOF,上传的最后一个块(可以比5MiB更小)。
  • 最终确定多部分上传。
  • Initiate S3 Multipart Upload.
  • Gather data into a buffer until that buffer reaches S3's lower chunk-size limit (5MiB). Generate MD5 checksum while building up the buffer.
  • Upload that buffer as a Part, store the ETag (read the docs on that one).
  • Once you reach EOF of your data, upload the last chunk (which can be smaller than 5MiB).
  • Finalize the Multipart Upload.

S3允许多达10,000份。因此,通过选择5MiB的一部分,大小,你就可以上传至50GiB动态文件。应该足以满足大多数使用情况。

S3 allows up to 10,000 parts. So by choosing a part-size of 5MiB you will be able to upload dynamic files of up to 50GiB. Should be enough for most use-cases.

不过:如果你需要更多的,你要提高你的部分的大小。通过使用一个较高的部分尺寸(10MiB为例),或在上传过程中增加了。

However: If you need more, you have to increase your part-size. Either by using a higher part-size (10MiB for example) or by increasing it during the upload.

First 25 parts:   5MiB (total:  125MiB)
Next 25 parts:   10MiB (total:  375MiB)
Next 25 parts:   25MiB (total:    1GiB)
Next 25 parts:   50MiB (total: 2.25GiB)
After that:     100MiB

这将允许你上传的文件,高达1TB(S3的限制单个文件是5TB现在)没有不必要的内存浪费。

This will allow you to upload files of up to 1TB (S3's limit for a single file is 5TB right now) without wasting memory unnecessarily.

他的问题是跟你不一样 - 他知道并使用内容长度上载前。他想改善这种状况:许多图书馆通过加载从一个文件中的所有数据到内存中处理上传。在伪code,这将是这样的:

His problem is different from yours - he knows and uses the Content-Length before the upload. He wants to improve on this situation: Many libraries handle uploads by loading all data from a file into memory. In pseudo-code that would be something like this:

data = File.read(file_name)
request = new S3::PutFileRequest()
request.setHeader('Content-Length', data.size)
request.setBody(data)
request.send()

他的解决方案经由文件系统API获取内容长度做的。然后,他从磁盘上的数据流转换请求流。在伪code:

His solution does it by getting the Content-Length via the filesystem-API. He then streams the data from disk into the request-stream. In pseudo-code:

upload = new S3::PutFileRequestStream()
upload.writeHeader('Content-Length', File.getSize(file_name))
upload.flushHeader()

input = File.open(file_name, File::READONLY_FLAG)

while (data = input.read())
  input.write(data)
end

upload.flush()
upload.close()

这篇关于我可以流文件上传到S3没有Content-Length头?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆