制作汇编器的设计模式 [英] Design Pattern For Making An Assembler

查看:198
本文介绍了制作汇编器的设计模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个8051汇编器。

I'm making an 8051 assembler.

在所有的内容都是一个读取下一个令牌的tokenizer之前,设置错误标志,识别EOF等。

然后是编译器的主循环,它读取下一个令牌并检查有效的助记符:

Before everything is a tokenizer which reads next tokens, sets error flags, recognizes EOF, etc.
Then there is the main loop of the compiler, which reads next tokens and check for valid mnemonics:

mnemonic= NextToken();
if (mnemonic.Error)
{
    //throw some error
}
else if (mnemonic.Text == "ADD")
{
    ...
}
else if (mnemonic.Text == "ADDC")
{
    ...
}

它继续有几种情况。比每种情况更糟的是,它检查有效参数,然后将其转换为编译代码。现在看起来像这样:

And it continues to several cases. Worse than that is the code inside each case, which checks for valid parameters then converts it to compiled code. Right now it looks like this:

if (mnemonic.Text == "MOV")
{
    arg1 = NextToken();
    if (arg1.Error) { /* throw error */ break; }
    arg2 = NextToken();
    if (arg2.Error) { /* throw error */ break; }

    if (arg1.Text == "A")
    {
        if (arg2.Text == "B")
            output << 0x1234; //Example compiled code
        else if (arg2.Text == "@B")
            output << 0x5678; //Example compiled code
        else
            /* throw "Invalid parameters" */
    }
    else if (arg1.Text == "B")
    {
        if (arg2.Text == "A")
            output << 0x9ABC; //Example compiled code
        else if (arg2.Text == "@A")
            output << 0x0DEF; //Example compiled code
        else
            /* throw "Invalid parameters" */
    }
}

对于每个助记符,我必须检查有效参数,然后创建正确的编译代码。非常类似的代码,用于在每种情况下检查每个助记符重复的有效参数。

For each of the mnemonics I have to check for valid parameters then create the correct compiled code. Very similar codes for checking the valid parameters for each mnemonic repeat in each case.

所以有改进此代码的设计模式?

或简单的一个简单的方式来实现这一点?

So is there a design pattern for improving this code?
Or simply a simpler way to implement this?

编辑:我接受了底座的答案,感谢他。仍然如果你有这方面的想法,我会很乐意学习他们。感谢所有。

I accepted plinth's answer, thanks to him. Still if you have ideas on this, i will be happy to learn them. Thanks all.

推荐答案

我已经编写了多个汇编程序,多年来一直在手工解析,坦白说,你可能更好关闭使用语法语言和解析器生成器。

I've written a number of assemblers over the years doing hand parsing and frankly, you're probably better off using a grammar language and a parser generator.

这就是为什么 - 典型的装配线可能看起来像这样:

Here's why - a typical assembly line will probably look something like this:

[label:] [instruction|directive][newline]

,并且一个指令将是:

plain-mnemonic|mnemonic-withargs

,一个指令将是:

plain-directive|directive-withargs

等。

使用像金牌这样一个体面的解析器生成器,您应该可以敲出8051的语法几个小时后。这个手工解析的优点是,您可以在汇编代码中具有复杂的表达式,如:

With a decent parser generator like Gold, you should be able to knock out a grammar for 8051 in a few hours. The advantage to this over hand parsing is that you will be able to have complicated enough expressions in your assembly code like:

.define kMagicNumber 0xdeadbeef
CMPA #(2 * kMagicNumber + 1)

哪个可以是一个真正的熊手动执行。

which can be a real bear to do by hand.

如果您想手动执行,请制作所有助记符表,其中还包括它们支持的各种允许的寻址模式,每个寻址模式,每个变体将采取的字节数和它的操作码。这样的东西:

If you want to do it by hand, make a table of all your mnemonics which will also include the various allowable addressing modes that they support and for each addressing mode, the number of bytes that each variant will take and the opcode for it. Something like this:

enum {
    Implied = 1, Direct = 2, Extended = 4, Indexed = 8 // etc
} AddressingMode; 

/* for a 4 char mnemonic, this struct will be 5 bytes.  A typical small processor
 * has on the order of 100 instructions, making this table come in at ~500 bytes when all
 * is said and done.
 * The time to binary search that will be, worst case 8 compares on the mnemonic.
 * I claim that I/O will take way more time than look up.
 * You will also need a table and/or a routine that given a mnemonic and addressing mode
 * will give you the actual opcode.
 */

struct InstructionInfo {
    char Mnemonic[4];
    char AddessingMode;
}

/* order them by mnemonic */
static InstructionInfo instrs[] = {
    { {'A', 'D', 'D', '\0'}, Direct|Extended|Indexed },
    { {'A', 'D', 'D', 'A'}, Direct|Extended|Indexed },
    { {'S', 'U', 'B', '\0'}, Direct|Extended|Indexed },
    { {'S', 'U', 'B', 'A'}, Direct|Extended|Indexed }
}; /* etc */

static int nInstrs = sizeof(instrs)/sizeof(InstrcutionInfo);

InstructionInfo *GetInstruction(char *mnemonic) {
   /* binary search for mnemonic */
}

int InstructionSize(AddressingMode mode)
{
    switch (mode) {
    case Inplied: return 1;
    / * etc */
    }
 }

然后你将有一个列表,每个指令又包含所有寻址模式的列表。

Then you will have a list of every instruction which in turn contains a list of all the addressing modes.

所以你的解析器就是这样的:

So your parser becomes something like this:

char *line = ReadLine();
int nextStart = 0;
int labelLen;
char *label = GetLabel(line, &labelLen, nextStart, &nextStart); // may be empty
int mnemonicLen;
char *mnemonic = GetMnemonic(line, &mnemonicLen, nextStart, &nextStart); // may be empty
if (IsOpcode(mnemonic, mnemonicLen)) {
    AddressingModeInfo info = GetAddressingModeInfo(line, nextStart, &nextStart);
    if (IsValidInstruction(mnemonic, info)) {
        GenerateCode(mnemonic, info);
    }
    else throw new BadInstructionException(mnemonic, info);
}
else if (IsDirective()) { /* etc. */ }

这篇关于制作汇编器的设计模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆