如何调试Python内存故障? [英] How to debug Python memory fault?
问题描述
编辑2:缩小范围,注释掉代码。
编辑3:似乎lxml可能不是罪魁祸首,谢谢!完整的脚本是。可能在Python解释器或lxml库中有一个错误,而且没有额外的工具很难找到它。
当CPU使用率上升时,您可以中断在gdb下运行的脚本到100%,看堆栈跟踪。这可能有助于了解脚本中的内容。
Edit: Really appreciate help in finding bug - but since it might prove hard to find/reproduce, any general debug help would be greatly appreciated too! Help me help myself! =)
Edit 2: Narrowing it down, commenting out code.
Edit 3: Seems lxml might not be the culprit, thanks! The full script is here. I need to go over it looking for references. What do they look like?
Edit 4: Actually, the scripts stops (goes 100%) in this, the parse_og
part of it. So edit 3 is false - it must be lxml somehow.
Edit 5 MAJOR EDIT: As suggested by David Robinson and TankorSmash below, I've found a type of data
content that will send lxml.etree.HTML( data )
in a wild loop. (I carelessly disregarded it, but find my sins redeemed as I've paid a price to the tune of an extra two days of debug! ;) A working crashing script is here. (Also opened a new question.)
Edit 6: Turns out this is a bug with lxml version 2.7.8 and below (at least). Updated to lxml 2.9.0, and bug is gone. Thanks also to the fine folks over at this follow-up question.
I don't know how to debug this weird problem I'm having. The below code runs fine for about five minutes, when the RAM is suddenly completely filled up (from 200MB to 1700MB during the 100% period - then when memory is full, it goes into blue wait state).
It's due to the code below, specifically the first two lines. That's for sure. But what is going on? What could possibly explain this behaviour?
def parse_og(self, data):
""" lxml parsing to the bone! """
try:
tree = etree.HTML( data ) # << break occurs on this line >>
m = tree.xpath("//meta[@property]")
#for i in m:
# y = i.attrib['property']
# x = i.attrib['content']
# # self.rj[y] = x # commented out in this example because code fails anyway
tree = ''
m = ''
x = ''
y = ''
i = ''
del tree
del m
del x
del y
del i
except Exception:
print 'lxml error: ', sys.exc_info()[1:3]
print len(data)
pass
You can try Low-level Python debugging with GDB. Probably there is a bug in Python interpreter or in lxml library and it is hard to find it without extra tools.
You can interrupt your script running under gdb when CPU usage goes to 100% and look at stack trace. It will probably help to understand what's going on inside script.
这篇关于如何调试Python内存故障?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!