制作汇编的设计样式 [英] Design Pattern For Making An Assembler

查看:152
本文介绍了制作汇编的设计样式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个8051汇编器。

I'm making an 8051 assembler.

在一切都是读取下一个标记的标记器之前,设置错误标志,识别EOF等。

然后是编译器的主循环,它读取下一个标记并检查有效的助记符:

Before everything is a tokenizer which reads next tokens, sets error flags, recognizes EOF, etc.
Then there is the main loop of the compiler, which reads next tokens and check for valid mnemonics:

mnemonic= NextToken();
if (mnemonic.Error)
{
    //throw some error
}
else if (mnemonic.Text == "ADD")
{
    ...
}
else if (mnemonic.Text == "ADDC")
{
    ...
}

还有几种情况。更糟的是每种情况下的代码,它检查有效的参数,然后将其转换为编译的代码。现在它看起来像这样:

And it continues to several cases. Worse than that is the code inside each case, which checks for valid parameters then converts it to compiled code. Right now it looks like this:

if (mnemonic.Text == "MOV")
{
    arg1 = NextToken();
    if (arg1.Error) { /* throw error */ break; }
    arg2 = NextToken();
    if (arg2.Error) { /* throw error */ break; }

    if (arg1.Text == "A")
    {
        if (arg2.Text == "B")
            output << 0x1234; //Example compiled code
        else if (arg2.Text == "@B")
            output << 0x5678; //Example compiled code
        else
            /* throw "Invalid parameters" */
    }
    else if (arg1.Text == "B")
    {
        if (arg2.Text == "A")
            output << 0x9ABC; //Example compiled code
        else if (arg2.Text == "@A")
            output << 0x0DEF; //Example compiled code
        else
            /* throw "Invalid parameters" */
    }
}

对于每个助记符,我必须检查有效参数,然后创建正确的编译代码。

For each of the mnemonics I have to check for valid parameters then create the correct compiled code. Very similar codes for checking the valid parameters for each mnemonic repeat in each case.

因此,是否有改进此代码的设计模式?

或只是一个更简单的方法来实现这个?

So is there a design pattern for improving this code?
Or simply a simpler way to implement this?

编辑:我接受了plinth的答案,感谢他。如果你有这方面的想法,我会很乐意学习它们。感谢所有。

I accepted plinth's answer, thanks to him. Still if you have ideas on this, i will be happy to learn them. Thanks all.

推荐答案

在过去几年里,我编写了大量的汇编程序来做手工解析,坦白说,你可能更好使用语法语言和解析器生成器。

I've written a number of assemblers over the years doing hand parsing and frankly, you're probably better off using a grammar language and a parser generator.

这是为什么 - 典型的装配线可能看起来像这样:

Here's why - a typical assembly line will probably look something like this:

[label:] [instruction|directive][newline]


$ b b

并且指令将是:

and an instruction will be:

plain-mnemonic|mnemonic-withargs

,指令将是:

plain-directive|directive-withargs

等。

使用像 Gold 这样的像样的解析器生成器,你应该能够在几个小时内敲掉8051的语法。这种过度解析的优点是,您将能够在汇编代码中有足够复杂的表达式,如:

With a decent parser generator like Gold, you should be able to knock out a grammar for 8051 in a few hours. The advantage to this over hand parsing is that you will be able to have complicated enough expressions in your assembly code like:

.define kMagicNumber 0xdeadbeef
CMPA #(2 * kMagicNumber + 1)

如果你想手工做,请制作一张包含所有助记符的表格,其中还包括他们支持的各种允许的寻址模式,以及每个寻址模式,每个变体将占用的字节数以及它​​的操作码。像这样:

If you want to do it by hand, make a table of all your mnemonics which will also include the various allowable addressing modes that they support and for each addressing mode, the number of bytes that each variant will take and the opcode for it. Something like this:

enum {
    Implied = 1, Direct = 2, Extended = 4, Indexed = 8 // etc
} AddressingMode; 

/* for a 4 char mnemonic, this struct will be 5 bytes.  A typical small processor
 * has on the order of 100 instructions, making this table come in at ~500 bytes when all
 * is said and done.
 * The time to binary search that will be, worst case 8 compares on the mnemonic.
 * I claim that I/O will take way more time than look up.
 * You will also need a table and/or a routine that given a mnemonic and addressing mode
 * will give you the actual opcode.
 */

struct InstructionInfo {
    char Mnemonic[4];
    char AddessingMode;
}

/* order them by mnemonic */
static InstructionInfo instrs[] = {
    { {'A', 'D', 'D', '\0'}, Direct|Extended|Indexed },
    { {'A', 'D', 'D', 'A'}, Direct|Extended|Indexed },
    { {'S', 'U', 'B', '\0'}, Direct|Extended|Indexed },
    { {'S', 'U', 'B', 'A'}, Direct|Extended|Indexed }
}; /* etc */

static int nInstrs = sizeof(instrs)/sizeof(InstrcutionInfo);

InstructionInfo *GetInstruction(char *mnemonic) {
   /* binary search for mnemonic */
}

int InstructionSize(AddressingMode mode)
{
    switch (mode) {
    case Inplied: return 1;
    / * etc */
    }
 }



Then you will have a list of every instruction which in turn contains a list of all the addressing modes.

所以你的解析器会变成这样:

So your parser becomes something like this:

char *line = ReadLine();
int nextStart = 0;
int labelLen;
char *label = GetLabel(line, &labelLen, nextStart, &nextStart); // may be empty
int mnemonicLen;
char *mnemonic = GetMnemonic(line, &mnemonicLen, nextStart, &nextStart); // may be empty
if (IsOpcode(mnemonic, mnemonicLen)) {
    AddressingModeInfo info = GetAddressingModeInfo(line, nextStart, &nextStart);
    if (IsValidInstruction(mnemonic, info)) {
        GenerateCode(mnemonic, info);
    }
    else throw new BadInstructionException(mnemonic, info);
}
else if (IsDirective()) { /* etc. */ }

这篇关于制作汇编的设计样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆