如何在一些Python代码中跟踪Heisenbug? [英] How do I track down a Heisenbug in some Python code?

查看:150
本文介绍了如何在一些Python代码中跟踪Heisenbug?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

快速背景:我们有一个用Python编写的大型源代码。它是针对域特定语言的编译器,内部的所有内容都以有向图表示。这些有向图是从集合构建的,所以我们在Python中使用内置集类型。



问题是我们最初没有意识到Python主动使用缺陷在一个集合对象中排序保证使用更快的非确定性实现。因此,当您迭代一个集合中的对象(如我们经常)时,返回的顺序是弱随机的。每次执行都不会发生变化,但是它会频繁更改。从我看到调试我们的代码看,似乎源代码的哈希作为随机数生成器的种子。因此,即使在未执行的路径中,更改代码也会导致set迭代器更改元素生成的顺序。



我们的基本调试策略是转储打印到我们认为错误的地方,根据输出进行细化,直到找到错误。不是非常优雅,但对于大多数事情,它在很大程度上是有效的。基本的问题是添加/更改任何打印语句会触发不同的行为和一个截然不同的执行路径。我们无法打印/记录一切,因为它将编译器的执行时间从大约20秒(可管理)减慢到大约一个小时(不可管理)。



如果你开始寻找这个问题,你会如何诊断一个只发生不常问题的消息?



编辑澄清
几个答案建议修正集合的顺序的方法。正如torkildr所说:这里的问题并不是这样,奇怪的是,问题是你的程序的行为就好像没有。这是正确的问题,但解决方案不是使用确定性集合。这只会掩盖行为。问题是找到为什么我们的程序以这种方式行为,并且这个行为。我们使用的算法应用于表示为无序集的图形。他们没有我需要找出这些错误及其发生的原因。



解决问题
事实证明,如果在$ code> __ eq __ 和 __ hash __ 之间的关系,我正在使用的Python(在OS-X上为2.6)存储在该集合中的对象不是一个有效的排序,因此系统呈现描述的弱随机行为。在set.add()的C实现中必须有一些代码,它使用随机模块中的东西来构建表示。这导致对系统熵池的依赖,这会改变磁盘写入的顺序。



没有直接的答案,但阅读kriss的后续问题导致了洞察力来解决这个问题,所以他得到了投票。

解决方案

为什么不改变影响设置输出顺序的代码,并使用 pdb 而不是添加打印?设置断点还会改变设置顺序吗?如果没有,pdb将允许您检查内部变量。



您对问题的描述也会导致一些谜。你怎么发现有一个bug?如果这种检测可以在运行时完成,一个可能的策略就是从你的代码运行pdb(就像 import pdb; pdb.set_trace()一样简单)看到它(使用运行时间ifs),而不是在更改代码后的后续执行。这样你就不需要修改代码来进行调试(但是也可能在代码被调试的时候删除这些断言,并且很强大)。



By在编写代码时,您也应该单独测试所有代码,那么隐藏错误的可能性就会更小。


Quick background: we have a large source base written in Python. It is a compiler for a domain specific language, and internally everything is represented as directed graphs. These digraphs are built up from sets, and so we use the builtin set type in Python.

The problem is that we didn't originally realise that Python actively uses the lack of ordering guarantee in a set object to use a faster non-deterministic implementation. So when you iterate over the objects in a set (as we do frequently) the order they are returned in is weakly random. It doesn't change on every execution, but it does change frequently. From what I've seen debugging our code it seems as if a hash of the source code acts as a seed for the random number generator. So changing the code, even in a path that is not executed, causes the set iterator to change the order that elements are generated in.

Our basic debugging strategy is to dump prints into where we think the error is, refining them based on the output until we find the bug. Not very elegant but for most things it largely works. The basic problems is that adding/changing any print statement triggers different behaviour and a wildly different execution path. We can't afford to print/log everything as it slows down the compiler's execution time from about 20s (managable) to about an hour (not so managable).

How would you diagnose a problem that only occurs infrequently and disappears when you start to look for it?

Edit for clarification: Several answers suggest ways to fix the ordering of the sets. As torkildr says below "The problem here isn't that sets behave strangely, the problem is that your program behaves as if it doesn't". This is exactly the problem, but the solution is not to use deterministic sets. This would simply mask the behaviour. The problem is to find why our program behaves this way and fix that behaviour. The algorithms that we use should work on graphs represented as unordered sets. They don't. I need to find out where these bugs and why they occur.

Problem solved: It turns out that on the implementation of Python that I'm using (2.6 on OS-X) if the relationship between the __eq__ and __hash__ methods of the objects being stored in the set is not quite a valid ordering then the system exhibits the weakly random behaviour described. There must be some code in the C implementation of set.add() that uses something from the random module to build the representation. This causes a dependency on the system entropy pool which changes the ordering on disk writes.

No direct answers, but reading kriss' follow-up question caused the insight to solve this problem so he gets the vote.

解决方案

Why not just change nothing in the code that affects set output ordering and use pdb instead of adding prints ? Does setting a breakpoint also change set ordering ? If not, pdb will allow you to inspect internal variables.

Your description of your problem also lead to some mysteries. How do you detect there is a bug ? If this detection can be done at run time a possible strategy would be to run pdb from your code (as simple as import pdb; pdb.set_trace()) as soon as you see it (using run time ifs), not in a subsequent execution after changing the code. This way you wouldn't have to change the code at all to debug (but maybe to remove those assertions later when the code will be debugged and rock strong).

By the way you should also unit test all your code when writing it, then it'll be much less likely to hide subtile bugs.

这篇关于如何在一些Python代码中跟踪Heisenbug?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆