你发现和修复的最棘手的bug是什么? [英] What's the toughest bug you ever found and fixed?

查看:193
本文介绍了你发现和修复的最棘手的bug是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么难以找到?你怎么跟踪它?



不够接近关闭,但另请参见

http://stackoverflow.com/questions/175854/what-is-the-funniest-bug-youve-ever-experienced

解决方案

这需要了解一些Z-8000汇编器,我将在之前解释。 >

我正在使用嵌入式系统(在Z-8000汇编器中)。公司的不同部门在同一平台上建立了不同的系统,并编写了一个功能库,我也在使用我的项目。错误是每次我打电话一个功能,程序崩溃了。我检查了我所有的输入;他们很好它必须是图书馆的一个错误,除了图书馆在全国数千个POS站点中被使用(并且正常工作)。



现在, Z-8000 CPU有16个16位寄存器,R0,R1,R2 ... R15,也可以被寻址为8个32位寄存器,名为RR0,RR2,RR4..RR14等。 ,重构一堆较旧的图书馆。它非常干净,遵循严格的编程标准。在每个函数的开始,将在函数中使用的每个寄存器都被推到堆栈上以保持其值。一切都很整洁整洁 - 他们是完美的。



然而,我研究了库的汇编器列表,我注意到这个函数的奇怪 - 在函数开始时,它有PUSH RR0 / PUSH RR2,最后有POP RR2 / POP R0。现在,如果没有这样做,它在开始时会在堆栈上推送4个值,但最后只删除其中的3个值。这是灾难的秘诀。堆栈顶部有一个未知的值,返回地址需要。该功能无法正常工作。



除了,我可以提醒你,它正在工作。每天在数千台机器上被称为数千次。它不可能不起作用。



经过一段时间的调试(这在使用20世纪80年代中期的工具的嵌入式系统中汇编程序不容易),它总会在返回时崩溃,因为错误的值发送到随机地址。显然,我不得不调试工作的应用程序,找出为什么它没有失败。



那么请记住,这个库非常好的保存在寄存器中的值所以一旦你把一个值放在注册表中,它就留在那里。 R1中有0000。当该函数被调用时,它总是有0000。因此,BUG在堆栈上留下0000。所以当函数返回时,它将跳转到地址0000,这恰好恰好是一个RET,它将从堆栈中弹出下一个值(正确的返回地址),然后跳转到该地址。数据完全掩盖了错误。



当然,在我的应用程序中,我在R1中有不同的值,所以它只是崩溃了....


What made it hard to find? How did you track it down?

Not close enough to close but see also
http://stackoverflow.com/questions/175854/what-is-the-funniest-bug-youve-ever-experienced

解决方案

This requires knowing a bit of Z-8000 assembler, which I'll explain as we go.

I was working on an embedded system (in Z-8000 assembler). A different division of the company was building a different system on the same platform, and had written a library of functions, which I was also using on my project. The bug was that every time I called one function, the program crashed. I checked all my inputs; they were fine. It had to be a bug in the library -- except that the library had been used (and was working fine) in thousands of POS sites across the country.

Now, Z-8000 CPUs have 16 16-bit registers, R0, R1, R2 ...R15, which can also be addressed as 8 32-bit registers, named RR0, RR2, RR4..RR14 etc. The library was written from scratch, refactoring a bunch of older libraries. It was very clean and followed strict programming standards. At the start of each function, every register that would be used in the function was pushed onto the stack to preserve its value. Everything was neat & tidy -- they were perfect.

Nevertheless, I studied the assembler listing for the library, and I noticed something odd about that function --- At the start of the function, it had PUSH RR0 / PUSH RR2 and at the end to had POP RR2 / POP R0. Now, if you didn't follow that, it pushed 4 values on the stack at the start, but only removed 3 of them at the end. That's a recipe for disaster. There an unknown value on the top of the stack where return address needed to be. The function couldn't possibly work.

Except, may I remind you, that it WAS working. It was being called thousands of times a day on thousands of machines. It couldn't possibly NOT work.

After some time debugging (which wasn't easy in assembler on an embedded system with the tools of the mid-1980s), it would always crash on the return, because the bad value was sending it to a random address. Evidently I had to debug the working app, to figure out why it didn't fail.

Well, remember that the library was very good about preserving the values in the registers, so once you put a value into the register, it stayed there. R1 had 0000 in it. It would always have 0000 in it when that function was called. The bug therefore left 0000 on the stack. So when the function returned it would jump to address 0000, which just so happened to be a RET, which would pop the next value (the correct return address) off the stack, and jump to that. The data perfectly masked the bug.

Of course, in my app, I had a different value in R1, so it just crashed....

这篇关于你发现和修复的最棘手的bug是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆