使用Ruby将大文件上传到S3时出现内存不足错误,如何批量读取和上传? [英] Uploading Large File to S3 with Ruby Fails with Out of Memory Error, How to Read and Upload in Chunks?

查看:143
本文介绍了使用Ruby将大文件上传到S3时出现内存不足错误,如何批量读取和上传?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在从Windows计算机通过Ruby AWS开发工具包(v2)将各种文件上传到S3.我们已经使用Ruby 1.9进行了测试.我们的代码可以正常工作,除非遇到大文件,抛出内存不足错误.

We are uploading various files to S3 via the Ruby AWS SDK (v2) from a Windows machine. We have tested with Ruby 1.9. Our code works fine except when large files are encountered, when an out of memory error is thrown.

首先,我们使用以下代码将整个文件读入内存:

At first we were reading the whole file into memory with this code:

:body => IO.binread(filepath),

然后在谷歌搜索之后,我们发现有一些方法可以使用Ruby读取文件:

Then after Googling we found that there were ways to read the file in chunks with Ruby:

:body =>  File.open(filepath, 'rb') { |io| io.read },

虽然此代码无法解决问题,但是我们找不到特定的S3(或相关)示例,该示例显示了如何读取文件并将其分块传递给S3.整个文件仍会加载到内存中,大文件会抛出内存不足错误.

This code did not resolve the issue though, and we can't find a specific S3 (or related) example which shows how the file can be read and passed to S3 in chunks. The whole file is still loaded into memory and throws an out of memory error with large files.

我们知道我们可以将文件拆分为多个块,然后使用AWS分段上传将其上传到S3,但是首选是尽可能避免这种情况(尽管这是唯一的方法).

We know we can split the file into chunks and upload to S3 using the AWS multi part upload, however the preference would be to avoid this if possible (although it's fine if it's the only way).

我们的代码示例如下.读取大块文件,避免内存不足错误并上传到S3的最佳方法是什么?

Our code sample is below. What is the best way to read the file in chunks, avoiding the out of memory errors, and upload to S3?

require 'aws-sdk'

filepath = 'c:\path\to\some\large\file.big'
bucket = 's3-bucket-name'
s3key = 'some/s3/key/file.big'
accesskeyid = 'ACCESSKEYID'
accesskey = 'ACCESSKEYHERE'
region = 'aws-region-here'

s3 = Aws::S3::Client.new(
  :access_key_id => accesskeyid,
  :secret_access_key => accesskey,
  :region => region
  )

resp = s3.put_object(
  :bucket => bucket,
  :key => s3key,
  :body =>  File.open(filepath, 'rb') { |io| io.read },
  )

请注意,我们没有达到S3 5GB的限制,例如1.5GB的文件就是这种情况.

Note that we are not hitting the S3 5GB limit, this is happening for files for example of 1.5GB.

推荐答案

用于Ruby的v2 AWS开发工具包aws-sdk gem支持直接通过网络流式传输对象,而无需将其加载到内存中.您的示例仅需进行较小的更改即可完成此操作:

The v2 AWS SDK for Ruby, aws-sdk gem, supports streaming objects directly over over the network without loading them into memory. Your example requires only a small correction to do this:

File.open(filepath, 'rb') do |file|
  resp = s3.put_object(
   :bucket => bucket,
   :key => s3key,
   :body => file
  )
end

之所以起作用,是因为它允许SDK每次对传入少量字节的文件对象调用#read.在没有第一个参数的情况下在Ruby IO对象(例如文件)上调用#read会将整个对象读入内存,并将其作为字符串返回.这就是导致内存不足错误的原因.

This works because it allows the SDK to call #read on the file object passing in a small number of bytes each time. Calling #read on a Ruby IO object, such as a file, without a first argument will read the entire object into memory, returning it as a string. This is what has caused your out-of-memory errors.

也就是说,aws-sdk gem提供了另一个更有用的界面,用于将文件上传到Amazon S3.此替代界面会自动:

That said, the aws-sdk gem provides another, more useful interface for uploading files to Amazon S3. This alternative interface automatically:

  • 对大型对象使用多部分API
  • 可以使用多个线程并行上传零件,从而提高上传速度
  • 计算数据客户端的MD5,以进行服务端数据完整性检查.

一个简单的例子:

# notice this uses Resource, not Client
s3 = Aws::S3::Resource.new(
  :access_key_id => accesskeyid,
  :secret_access_key => accesskey,
  :region => region
)

s3.bucket(bucket).object(s3key).upload_file(filepath)

这是aws-sdk资源接口的一部分.这里有很多有用的实用程序. Client类仅提供基本的API功能.

This is part of the aws-sdk resource interfaces. There are quite a few helpful utilities in here. The Client class only provides basic API functionality.

这篇关于使用Ruby将大文件上传到S3时出现内存不足错误,如何批量读取和上传?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆