Ruby/Mechanize“分配内存失败".抹掉'agent.get'方法的实例化? [英] Ruby/Mechanize "failed to allocate memory". Erasing instantiation of 'agent.get' method?

查看:91
本文介绍了Ruby/Mechanize“分配内存失败".抹掉'agent.get'方法的实例化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于在机械化Ruby脚本中泄漏内存,我有一个小问题.

I've got a little problem about leaking memory in a Mechanize Ruby script.

我"while循环"可以永久访问多个网页,并且每个循环的内存都增加了很多.在几分钟后创建了一个分配内存失败"并退出了脚本.

I "while loop" multiple web pages access forever and memory increase a lot on each loop. That created a "failed to allocate memory" after minutes and made script exit.

实际上,即使我将结果分配给相同的局部变量"或什至全局变量",agent.get方法似乎也会实例化并保存结果. 因此,我尝试在最后一次使用之后和重用相同名称变量之前将nil分配给变量.但是似乎以前的agent.get结果仍在内存中,真的不知道如何在几个小时后耗尽内存以使脚本使用大致稳定的内存量吗?

In fact, it seems that the agent.get method instantiate and hold the result even if I assign the result to the same "local variable" or even a "global variable". So I tried to assign nil to the variable after last used and before reusing the same name variable. But it seems that previous agent.get results are still in memory and really don't know how to drain RAM to make my script using a roughly stable amount of memory after hours?

这里有两种代码:(保持"enter"键,并看到Ruby分配的RAM不断增长)

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
GC.enable
#puts GC.malloc_allocations
while gets.chomp!="stop"
    page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    page = agent.get 'http://www.nypost.com/'
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    puts local_variables
    GC.start
    puts local_variables
    #puts GC.malloc_allocations
end

并使用全局变量代替:

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
while gets.chomp!="stop"
    $page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "$page.object_id  : "+$page.object_id.to_s
    $page = agent.get 'http://www.nypost.com/'
    puts "$page.object_id  : "+$page.object_id.to_s
    #puts local_variables
    #puts global_variables
end

在其他语言中,变量将重新受影响,并且分配的内存保持稳定.为什么红宝石没有?如何强制实例进行垃圾处理?

In other languages the variable is re-affected and allocated memory stay stable. why ruby doesn't? How can I force instances to garbage?

这是另一个使用对象的示例,因为Ruby是一种面向对象的语言,但结果却完全相同:内存一次又一次地增长……

Edit : Here is an other example using Object as Ruby is an Object Oriented language but result is exactly the same : memory grow again and again...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            remove_instance_variable(:@page)
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

我的答案(信誉不足,无法正确执行)

My Answer (not enough reputation to do it properly)

好吧!

似乎 Mechanize::History.clear 可以大大解决此问题内存泄漏.

It seems that Mechanize::History.clear greatly solves this problem of memory leak.

如果要在测试之前和之后进行测试,这是最后修改的Ruby代码.

here is the last Ruby code modified if you want to test before and after...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            $agent.history.clear
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

推荐答案

我的建议是设置agent.max_history =0.如链接问题列表中所述.

My suggestion is setting agent.max_history = 0. As mentioned in the list of linked issues.

这将避免添加历史记录条目,而不是使用#clear.

This will keep a history entry from even being added, instead of using #clear.

这是其他答案的修改版本

Here is the modified version of the other answer

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
$agent.max_history = 0
class GetContent
    def initialize url
        while true
            @page = $agent.get url
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

这篇关于Ruby/Mechanize“分配内存失败".抹掉'agent.get'方法的实例化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆