Errno :: ENOMEM:无法分配内存-猫 [英] Errno::ENOMEM: Cannot allocate memory - cat

查看:237
本文介绍了Errno :: ENOMEM:无法分配内存-猫的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正在生产中运行的作业,该作业处理xml文件. xml文件总共约4k,大小在8到9 GB之间.

I have a job running on production which process xml files. xml files counts around 4k and of size 8 to 9 GB all together.

处理后,我们将获得CSV文件作为输出.我有一个cat命令,它将所有CSV文件合并到我得到的单个文件中:

After processing we get CSV files as output. I've a cat command which will merge all CSV files to a single file I'm getting:

Errno :: ENOMEM:无法分配内存

Errno::ENOMEM: Cannot allocate memory

cat(反引号)命令上.

以下是一些详细信息:

  • 系统内存-4 GB
  • 交换-2 GB
  • Ruby:1.9.3p286

使用nokogirisaxbuilder-0.0.8处理文件.

在这里,有一段代码可以处理4,000个XML文件,并且输出以CSV格式保存(每个xml 1个)(对不起,我不打算将其共享为公司政策).

Here, there is a block of code which will process 4,000 XML files and output is saved in CSV (1 per xml) (sorry, I'm not suppose to share it b'coz of company policy).

下面是将输出文件合并为单个文件的代码

Below is the code which will merge the output files to a single file

Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each {|file|
            `cat #{file} >> #{final_output_file}`
}

我在处理过程中拍摄了内存消耗快照,它消耗了几乎所有内存,但是不会失败. 它总是在cat命令上失败.

I've taken memory consumption snapshots during processing.It consumes almost all part of the memory, but, it won't fail. It always fails on cat command.

我想,在回弹时,它会尝试派生一个新进程,该进程没有足够的内存,因此会失败.

I guess, on backtick it tries to fork a new process which doesn't get enough memory so it fails.

请让我知道您的意见和替代方案.

Please let me know your opinion and alternative to this.

推荐答案

因此,看来您的系统运行时的内存不足,而产生的shell +调用cat对于剩下的少量内存来说实在太多了.

So it seems that your system is running pretty low on memory and spawning a shell + calling cat is too much for the few memory left.

如果您不介意降低速度,则可以使用较小的缓冲区将文件合并到ruby中. 这样可以避免产生外壳,并且您可以控制缓冲区的大小.

If you don't mind loosing some speed, you can merge the files in ruby, with small buffers. This avoids spawning a shell, and you can control the buffer size.

这未经测试,但您知道了:

This is untested but you get the idea :

buffer_size = 4096
output_file = File.open(final_output_file, 'w')

Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each do |file|
  f = File.open(file)
  while buffer = f.read(buffer_size)
    output_file.write(buffer)
  end
  f.close
end

这篇关于Errno :: ENOMEM:无法分配内存-猫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆