无法在Ruby中分配内存(无MemoryError)? [英] Failed to allocate memory (No MemoryError) in Ruby?

查看:127
本文介绍了无法在Ruby中分配内存(无MemoryError)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个简单的脚本,该脚本应该读取整个目录,然后通过摆脱HTML标签并将其写入一个文件中,从而将HTML数据解析为普通脚本.

I wrote a simple script that is supposed to read an entire directory and then parse the HTML data into normal script by getting rid off the HTML tags and then write it into one file.

我有8GB内存,也有大量可用的虚拟内存.当我这样做时,我有超过5GB的可用RAM.目录中最大的文件为3.8 GB.

I have 8GB memory and also plenty of available virtual memory. When I am doing this I have more than 5GB RAM available. The largest file in the directory is 3.8 GB.

脚本是

file_count = 1
File.open("allscraped.txt", 'w') do |out1|
    for file_name in Dir["allParts/*.dat"] do
        puts "#{file_name}#:#{file_count}"
        file_count +=1
        File.open(file_name, "r") do |file|
            source = ""
            tmp_src = ""
            counter = 0
            file.each_line do |line|
                scraped_content = line.gsub(/<.*?\/?>/, '')
                tmp_src << scraped_content
                if (counter % 10000) == 0
                    tmp_src = tmp_src.gsub( /\s{2,}/, "\n" )
                    source << tmp_src
                    tmp_src = ""
                    counter = 0
                end
                counter += 1
            end
            source << tmp_src.gsub( /\s{2,}/, "\n" )
            out1.write(source)
            break
        end
    end
end

完整的错误代码是:

realscraper.rb:33:in `block (4 levels) in <main>': failed to allocate memory (No
MemoryError)
        from realscraper.rb:27:in `each_line'
        from realscraper.rb:27:in `block (3 levels) in <main>'
        from realscraper.rb:23:in `open'
        from realscraper.rb:23:in `block (2 levels) in <main>'
        from realscraper.rb:13:in `each'
        from realscraper.rb:13:in `block in <main>'
        from realscraper.rb:12:in `open'
        from realscraper.rb:12:in `<main>'

第27行是file.each_line do |line|,第33行是source << tmp_src.失败的文件是最大的文件(3.8 GB).这里有什么问题?即使我有足够的内存,为什么仍会收到此错误?另外我该如何解决?

Where line#27 is file.each_line do |line| and 33 is source << tmp_src. The failing file is the largest one (3.8 GB). What is the problem here? Why am I getting this error even though I have enough memory? Also how can I fix it?

推荐答案

问题出在这两行:

source << tmp_src
source << tmp_src.gsub( /\s{2,}/, "\n" )

读取大文件时,您会在内存中缓慢增长一个很大的字符串.

When you read a large file you are slowly growing a very large string in memory.

最简单的解决方案是根本不使用此临时source字符串,而是将结果直接写入文件.只需将其替换为这两行:

The simplest solution is not to use this temporary source string at all, but to write the results directly to the file. Just replace those two lines with this instead:

# source << tmp_src
out1.write(tmp_src) 

# source << tmp_src.gsub( /\s{2,}/, "\n" )
out1.write(tmp_src.gsub( /\s{2,}/, "\n" ))                     

这样,您就不会在内存中创建任何大的临时字符串,并且这种方式应该可以更好(更快)地工作.

This way you're not creating any big temporary strings in memory and it should work better (and faster) this way.

这篇关于无法在Ruby中分配内存(无MemoryError)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆