解开汇编语言意大利面条代码 [英] Unravelling Assembly Language Spaghetti Code

查看:43
本文介绍了解开汇编语言意大利面条代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I've inherited a 10K-line program written in 8051 assembly language that requires some changes. Unfortunately it's written in the finest traditions of spaghetti code. The program--written as a single file--is a maze of CALL and LJMP statements (about 1200 total), with subroutines having multiple entry and/or exit points, if they can be identified as subroutines at all. All variables are global. There are comments; some are correct. There are no existing tests, and no budget for refactoring.

A little background on the application: The code controls a communications hub in a vending application that is currently deployed internationally. It handles two serial streams simultaneously (with the help of a separate communications processor) and can be talking to up to four different physical devices, each from a different vendor. The manufacturer of one of the devices recently made a change ("Yeah, we made a change, but the software's absolutely the same!") which causes some system configurations to no longer work, and is not interested in unchanging it (whatever it was they didn't change).

The program was originally written by another company, transferred to my client, then modified nine years ago by another consultant. Neither the original company, nor the consultant, are available as resources.

Based on analysis of the traffic on one of the serial buses, I've come up with a hack, which appears to work, but it's ugly and doesn't address the root cause. If I had a better understanding of the program, I believe I could address the actual problem. I have about one more week before the code's frozen to support an end-of-the month ship date.

Original question: I need to understand the program well enough to make the changes without breakage. Has anyone developed techniques for working with this sort of mess?

I see some great suggestions here, but am limited by time. However I may have another opportunity in the future to pursue some of the more involved courses of action.

解决方案

First, I would try to get in touch with those people who originally developed the code or who at least maintained it before me, hopefully getting enough information to get a basic understanding of the code in general, so that you can start adding useful comments to it.

Maybe you can even get someone to describe the most important APIs (including their signature, return values and purpose) for the code. If global state is modified by a function, this should also be made explicit. Similarly, start to differentiate between functions and procedures, as well as input/output registers.

You should make it very clear to your employer that this information is required, if they don't believe you, have them actually sit down with you in front of this code while you describe what you are supposed to do and how you have to do it (reverse engineering). Having an employer with a background in computing and programming will actually be helpful in that case!

If your employer doesn't have such a technical background, ask him to bring another programmer/colleague to explain your steps to him, doing so will actually show him that you are serious and honest about it, because it's a real issue - not just from your point of view (make sure to have colleagues who know about this 'project').

If available and feasible, I would also make it very clear, that contracting (or at the very least contacting) former developers/maintainers (if they are no longer working for your company, that is) to help document this code would be a pre-requisite to realistically improve the code within a short time span and to ensure that it can be more easily maintained in the future.

Emphasize that this whole situation is due to shortcomings in the previous software development process and that these steps will help improve the code base. So, the code base in its current form is a growing problem and whatever is done now to handle this problem is an investment for the future.

This in itself is also important to help them assess and understand your situation: To do what you are supposed to do now is far from trivial, and they should know about it - if only to set their expectations straight (e.g. regarding deadlines and complexity of the task).

Also, personally I would start adding unit tests for those parts that I understand well enough, so that I can slowly start refactoring/rewriting some code.

In other words, good documentation and source code comments are one thing, but having a comprehensive test suite is another important thing, noone can be realistically expected to modify an unfamiliar code base without any established way of testing key functionality.

Given that the code is 10K, I would also look into factoring out subroutines into separate files to make components more identifiable, preferably using access wrappers instead of global variables and also intuitive file names.

Besides, I would look into steps to further improve the readability of the source code by decreasing the complexity, having sub routines with multiple entry points (and possibly even different parameter signatures?) looks like a sure way to obfuscate the code unnecessarily.

Similarly, huge sub routines could also be refactored into smaller ones to help improve readability.

So, one of the very first things, I'd look into doing would be to determine those things that make it really complicated to grok the code base and then rework those parts, for example by splitting huge sub routines with multiple entry points into distinct sub routines that call each other instead. If this cannot be done due to performance reasons or call overhead, use macros instead.

In addition, if it is a viable option, I would consider incrementally rewriting portions of the code using a more high level language, either by using a subset of C, or at least by making fairly excessive use of assembly macros to help standardize the code base, but also to help localize potential bugs.

If an incremental rewrite in C is a feasible option, one possible way to get started would be to turn all obvious functions into C functions whose bodies are -in the beginning- copied/pasted from the assembly file, so that you end up with C functions with lots of inline assembly.

Personally, I would also try running the code in a simulator/emulator to easily step through the code and hopefully start understanding the most important building blocks (while examining register and stack usage), a good 8051 simulator with a built-in debugger should be made available to you if you really have to do this largely on your own.

This would also help you come up with the initialization sequence and main loop structure as well as a callgraph.

Maybe, you can even find a good open source 80851 simulator that can be easily modified to also provide a full callgraph automatically, just doing a quick search, I found gsim51, but there are obviously several other options, various proprietary ones as well.

If I were in your situation, I would even consider outsourcing the effort of modifying my tools to simplify working with this source code, i.e. many sourceforge projects accept donations and maybe you can talk your employer into sponsoring such a modification.

If not financially, maybe by you providing corresponding patches to it?

If you are already using a proprietary product, you might even be able to talk with the manufacturer of this software and detail your requirements and ask them if they are willing to improve this product that way or if they can at least expose an interface to allow customers to make such customizations (some form of internal API or maybe even simple glue scripts).

If they are not responsive, indicate that your employer has been thinking of using a different product for some time now and that you were the only one insisting on that particular product to be used ... ;-)

If the software expects certain I/O hardware and peripherals, you may even want to look into writing a corresponding hardware simulation loop to run the software in an emulator.

Ultimately, I know for a fact that I would personally much more enjoy the process of customizing other software to help me understand such a spaghetti code monster, than manually stepping through the code and playing emulator myself, no matter how many gallons of coffee I can get.

Getting a usable callgraph out of an open source 8051 emulator should not take much longer than say a weekend (at most), because it mostly means to look for CALL opcodes and record their addresses (position and target), so that everything's dumped to a file for later inspection.

Having access to an emulator's internals would actually also be great a way to further inspect the code, for example in order to find recurring patterns of opcodes (say 20-50+), that may be factored into standalone functions/procedures, this might actually help decrease the size and complexity of the code base even further.

The next step would probably be to examine stack and register usage. And to determine the type/size of function parameters used, as well as their value range - so that you can conceive corresponding unit tests.

Using tools like dot/graphviz to visualize the structure of the initialization sequence and the main loop itself, will be a pure joy compared to doing all this stuff manually.

Also, you'll actually end up with useful data and documents that can serve as the foundation for better documentation in the long run.

这篇关于解开汇编语言意大利面条代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆