“内存不足";机械化错误 [英] "Out of Memory" error with mechanize

查看:76
本文介绍了“内存不足";机械化错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图逐页从网站上抓取一些信息,基本上这就是我所做的:

I was trying to scrape some information from a website page by page, basically here's what I did:

import mechanize
MechBrowser = mechanize.Browser()

Counter = 0

while Counter < 5000:
    Response = MechBrowser.open("http://example.com/page" + str(Counter))
    Html = Response.read()
    Response.close()

    OutputFile = open("Output.txt", "a")
    OutputFile.write(Html)
    OutputFile.close()

    Counter = Counter + 1

好吧,以上代码最终抛出了内存不足"错误,并且在任务管理器中显示脚本在运行了几个小时后已消耗了将近1GB的内存...怎么了?!

Well, the above codes ended up throwing out "Out of Memory" error and in task manager it shows that the script used up almost 1GB memory after several hours running... how come?!

有人能告诉我哪里出了问题吗?

Would anybody tell me what went wrong?

推荐答案

这不完全是内存泄漏,而是未记录的功能.基本上,mechanize.Browser()会将所有浏览器历史记录一起存储在内存中.

This is not exactly a memory leak, but rather an undocumented feature. Basically, mechanize.Browser() is collectively storing all browser history in memory as it goes.

如果在Response.close()之后添加对MechBrowser.clear_history()的呼叫,它将解决此问题.

If you add a call to MechBrowser.clear_history() after Response.close(), it should resolve the problem.

这篇关于“内存不足";机械化错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆