在rails应用程序中读取大型csv文件占用了大量内存 - 减少内存消耗的策略? [英] reading large csv files in a rails app takes up a lot of memory - Strategy to reduce memory consumption?
问题描述
我有一个rails应用程序,允许用户上传csv文件,并计划多个csv文件的阅读帮助delayed_job宝石。问题是应用程序将其中的每个文件读入内存,然后写入数据库。如果它只是一个文件被读取它的罚款,但当多个文件被读取服务器上的RAM已满,并导致应用程序挂起。
I have a rails app which allows users to upload csv files and schedule the reading of multiple csv files with help of delayed_job gem. The problem is the app reads each file in its entirity into memory and then writes to the database. If its just 1 file being read its fine, but when multiple files are read the RAM on the server gets full and causes the app to hang.
我试图找到这个问题的解决方案。
I am trying to find a solution for this problem.
我研究的一个解决方案是将csv文件分成更小的部分,并将它们保存在服务器上,并读取较小的文件。请参阅链接
One solution I researched is to break the csv file into smaller parts and save them on the server, and read the smaller files. see this link
example: split -b 40k myfile segment
不是我首选的解决方案。有没有任何其他的方法来解决这里我不必打破文件。解决方案必须是ruby代码。
Not my preferred solution. Are there any other approaches to solve this where I dont have to break the file. Solutions must be ruby code.
感谢,
推荐答案
使用 CSV .foreach
只读取CSV文件的大小:
You can make use of CSV.foreach
to read just chunks of your CSV file:
path = Rails.root.join('data/uploads/.../upload.csv') # or, whatever
CSV.foreach(path) do |row|
# process row[i] here
end
后台工作,则可以另外调用 GC.start
每 行。
If it's run in a background job, you could additionally call GC.start
every n rows.
CSV.foreach
在IO流上运行,您可以在此处看到:
CSV.foreach
operates on an IO stream, as you can see here:
def IO.foreach(path, options = Hash.new, &block)
# ...
open(path, options) do |csv|
csv.each(&block)
end
end
csv.each
部分是对 IO#each ,逐行读取文件( rb_io_getline_1
调用),并将行读为垃圾回收:
The csv.each
part is a call to IO#each, which reads the file line by line (rb_io_getline_1
invokation) and leaves the line read to be garbage collected:
static VALUE
rb_io_each_line(int argc, VALUE *argv, VALUE io)
{
// ...
while (!NIL_P(str = rb_io_getline_1(rs, limit, io))) {
rb_yield(str);
}
// ...
}
这篇关于在rails应用程序中读取大型csv文件占用了大量内存 - 减少内存消耗的策略?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!