在rails应用程序中读取大型csv文件占用了大量内存 - 减少内存消耗的策略? [英] reading large csv files in a rails app takes up a lot of memory - Strategy to reduce memory consumption?

查看:214
本文介绍了在rails应用程序中读取大型csv文件占用了大量内存 - 减少内存消耗的策略?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个rails应用程序,允许用户上传csv文件,并计划多个csv文件的阅读帮助delayed_job宝石。问题是应用程序将其中的每个文件读入内存,然后写入数据库。如果它只是一个文件被读取它的罚款,但当多个文件被读取服务器上的RAM已满,并导致应用程序挂起。

I have a rails app which allows users to upload csv files and schedule the reading of multiple csv files with help of delayed_job gem. The problem is the app reads each file in its entirity into memory and then writes to the database. If its just 1 file being read its fine, but when multiple files are read the RAM on the server gets full and causes the app to hang.

我试图找到这个问题的解决方案。

I am trying to find a solution for this problem.

我研究的一个解决方案是将csv文件分成更小的部分,并将它们保存在服务器上,并读取较小的文件。请参阅链接

One solution I researched is to break the csv file into smaller parts and save them on the server, and read the smaller files. see this link

 example: split -b 40k myfile segment

不是我首选的解决方案。有没有任何其他的方法来解决这里我不必打破文件。解决方案必须是ruby代码。

Not my preferred solution. Are there any other approaches to solve this where I dont have to break the file. Solutions must be ruby code.

感谢,

推荐答案

使用 CSV .foreach 只读取CSV文件的大小:

You can make use of CSV.foreach to read just chunks of your CSV file:

 path = Rails.root.join('data/uploads/.../upload.csv') # or, whatever
 CSV.foreach(path) do |row|
   # process row[i] here
 end

后台工作,则可以另外调用 GC.start 行。

If it's run in a background job, you could additionally call GC.start every n rows.

CSV.foreach 在IO流上运行,您可以在此处看到:

CSV.foreach operates on an IO stream, as you can see here:

def IO.foreach(path, options = Hash.new, &block)
  # ...
  open(path, options) do |csv|
    csv.each(&block)
  end
end

csv.each 部分是对 IO#each ,逐行读取文件( rb_io_getline_1 调用),并将行读为垃圾回收:

The csv.each part is a call to IO#each, which reads the file line by line (rb_io_getline_1 invokation) and leaves the line read to be garbage collected:

static VALUE
rb_io_each_line(int argc, VALUE *argv, VALUE io)
{
    // ...
    while (!NIL_P(str = rb_io_getline_1(rs, limit, io))) {
        rb_yield(str);
    }
    // ...
}

这篇关于在rails应用程序中读取大型csv文件占用了大量内存 - 减少内存消耗的策略?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆