将CSV流从Ruby上传到S3 [英] Upload CSV stream from Ruby to S3

查看:60
本文介绍了将CSV流从Ruby上传到S3的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理潜在的巨大CSV文件,我想从我的Rails应用程序中导出该文件,并且由于它运行在Heroku上,所以我的想法是在生成它们时将这些CSV文件直接流式传输到S3.

I am dealing with potentially huge CSV files which I want to export from my Rails app, and since it runs on Heroku, my idea was to stream these CSV files directly to S3 when generating them.

现在,我遇到了一个问题,因为Aws::S3需要一个文件以便能够执行上传,而在我的Rails应用中,我想执行以下操作:

Now, I have an issue, in that Aws::S3 expects a file in order to be able to perform an upload, while in my Rails app I would like to do something like:

S3.bucket('my-bucket').object('my-csv') << %w(this is one line)

我该如何实现?

推荐答案

您可以使用s3分段上传,该分段上传允许通过将大对象拆分为多个块来进行上传. https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html

You can use s3 multipart upload that allows upload by splitting large objects to multiple chunks. https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html

分段上传需要更复杂的编码,但是aws-sdk-ruby V3支持upload_stream方法,该方法似乎在内部执行分段上传,并且非常易于使用.也许是此用例的确切解决方案. https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#upload_stream-instance_method

Multipart upload requires more complex coding but aws-sdk-ruby V3 supports upload_stream method which seems to execute multipart upload internally and it's very easy to use. Maybe exact solution for this use case. https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#upload_stream-instance_method

client = Aws::S3::Client.new(
  region: 'ap-northeast-1',
  credentials: your_credential
)

obj = Aws::S3::Object.new('your-bucket-here', 'path-to-output', client: client)

require "csv"
obj.upload_stream do |write_stream|
  [
    %w(this is first line),
    %w(this is second line),
    %w(this is third line),
  ].each do |line|
    write_stream << line.to_csv
  end
end

this,is,first,line
this,is,second,line
this,is,third,line

upload_stream块的参数通常可以用作IO对象,它使您可以像生成文件或其他IO对象那样链接和包装CSV生成:

The argument to the upload_stream block can usually be used as an IO object, which allows you to chain and wrap CSV generation as you would for a file or other IO object:

obj.upload_stream do |write_stream|
  CSV(write_stream) do |csv|
    [
      %w(this is first line),
      %w(this is second line),
      %w(this is third line),
    ].each do |line|
      csv << line
    end
  end
end

例如,您可以在生成和上传CSV时压缩它,使用临时文件来减少内存占用:

Or for example, you could compress the CSV while you generate and upload it, using a tempfile to reduce memory footprint:

obj.upload_stream(tempfile: true) do |write_stream|
  Zlib::GzipWriter.wrap(write_stream) do |gzw|
    CSV(gzw) do |csv|
      [
        %w(this is first line),
        %w(this is second line),
        %w(this is third line),
      ].each do |line|
        csv << line
      end
    end
  end
end

这篇关于将CSV流从Ruby上传到S3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆