您发现并修复的最棘手的错误是什么? [英] What's the toughest bug you ever found and fixed?

查看:24
本文介绍了您发现并修复的最棘手的错误是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是什么让我很难找到?你是如何追踪到它的?

What made it hard to find? How did you track it down?

不够接近关闭但也看到
https://stackoverflow.com/questions/175854/what-is-你所经历过的最有趣的错误

推荐答案

这需要了解一些 Z-8000 汇编程序,我将在我们进行时解释.

This requires knowing a bit of Z-8000 assembler, which I'll explain as we go.

我正在研究一个嵌入式系统(在 Z-8000 汇编器中).公司的另一个部门正在同一平台上构建不同的系统,并编写了一个函数库,我也在我的项目中使用了它.错误是每次我调用一个函数时,程序都会崩溃.我检查了所有输入;他们很好.这一定是库中的一个错误——除了该库已在全国数千个 POS 站点中使用(并且运行良好).

I was working on an embedded system (in Z-8000 assembler). A different division of the company was building a different system on the same platform, and had written a library of functions, which I was also using on my project. The bug was that every time I called one function, the program crashed. I checked all my inputs; they were fine. It had to be a bug in the library -- except that the library had been used (and was working fine) in thousands of POS sites across the country.

现在,Z-8000 CPU 有 16 个 16 位寄存器,R0、R1、R2 ...R15,它们也可以作为 8 个 32 位寄存器寻址,命名为 RR0、RR2、RR4..RR14 等.库是从头开始编写的,重构了一堆旧库.它非常干净并遵循严格的编程标准.在每个函数开始时,将在函数中使用的每个寄存器都被压入堆栈以保留其值.一切都很整洁&整洁 - 他们是完美的.

Now, Z-8000 CPUs have 16 16-bit registers, R0, R1, R2 ...R15, which can also be addressed as 8 32-bit registers, named RR0, RR2, RR4..RR14 etc. The library was written from scratch, refactoring a bunch of older libraries. It was very clean and followed strict programming standards. At the start of each function, every register that would be used in the function was pushed onto the stack to preserve its value. Everything was neat & tidy -- they were perfect.

尽管如此,我研究了库的汇编器列表,我注意到该函数有些奇怪——在函数开始时,它有 PUSH RR0/PUSH RR2,最后有 POP RR2/POP R0.现在,如果您不遵循这一点,它会在开始时将 4 个值压入堆栈,但在最后仅删除了其中的 3 个.这是灾难的秘诀.在需要返回地址的堆栈顶部有一个未知值.该功能不可能工作.

Nevertheless, I studied the assembler listing for the library, and I noticed something odd about that function --- At the start of the function, it had PUSH RR0 / PUSH RR2 and at the end to had POP RR2 / POP R0. Now, if you didn't follow that, it pushed 4 values on the stack at the start, but only removed 3 of them at the end. That's a recipe for disaster. There an unknown value on the top of the stack where return address needed to be. The function couldn't possibly work.

除此之外,我可以提醒您,它是有效的.它每天在数千台机器上被调用数千次.它不可能不起作用.

Except, may I remind you, that it WAS working. It was being called thousands of times a day on thousands of machines. It couldn't possibly NOT work.

经过一段时间的调试(使用 1980 年代中期的工具在嵌入式系统上的汇编程序中并不容易),它总是会在返回时崩溃,因为错误的值将其发送到随机地址.显然,我必须调试正在运行的应用程序,以找出它没有失败的原因.

After some time debugging (which wasn't easy in assembler on an embedded system with the tools of the mid-1980s), it would always crash on the return, because the bad value was sending it to a random address. Evidently I had to debug the working app, to figure out why it didn't fail.

好吧,请记住,库非常擅长保存寄存器中的值,因此一旦您将值放入寄存器中,它就会留在那里.R1 里面有 0000.调用该函数时,它始终包含 0000.因此,该错误在堆栈上留下了 0000.因此,当函数返回时,它会跳转到地址 0000,这恰好是一个 RET,它将从堆栈中弹出下一个值(正确的返回地址),并跳转到该地址.数据完美地掩盖了错误.

Well, remember that the library was very good about preserving the values in the registers, so once you put a value into the register, it stayed there. R1 had 0000 in it. It would always have 0000 in it when that function was called. The bug therefore left 0000 on the stack. So when the function returned it would jump to address 0000, which just so happened to be a RET, which would pop the next value (the correct return address) off the stack, and jump to that. The data perfectly masked the bug.

当然,在我的应用程序中,我在 R1 中有一个不同的值,所以它就崩溃了......

Of course, in my app, I had a different value in R1, so it just crashed....

这篇关于您发现并修复的最棘手的错误是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆