如何手工写（壳）词法分析器 [英] How to write a (shell) lexer by hand

查看：179 发布时间：2016/8/21 20:03:07 c parsing shell lexer

本文介绍了如何手工写（壳）词法分析器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我工作的一个壳，小bash类似的外壳，无需脚本（如果在...）
我必须作出手工词法分析器/分析器（LL）。

I'm working on a shell, a small bash-like shell, without scripting (if while ...) I have to make the lexer/parser (LL) by hand.

让词法分析器将改变命令（的char * CMD ），以一个链表（ t_list *名单）。
而LL解析器将改变链表（ t_list *名单）至AST（二叉树的 t_btree *根）以<一个href=\"http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_10\">grammar

So the lexer will transform the command (char *cmd) to a linked list (t_list *list). And the LL parser will transform the linked list (t_list *list) to an AST (binary tree t_btree *root) with a grammar

所以，我知道如何使LL解析器，但我不知道如何来标记我的命令。

So, I know how to make the LL parser but I don't know how to tokenize my command.

例如： PS | grep的LS＆GT;＆GT;文件;让＆功放;＆安培; ./a.out

=> 'PS''|' 'grep'可以'ls'的'＆GT;＆GT;' '文件'';' ''做'和;＆放大器;' ./a.out

感谢。

（我不希望使用任何发电机）

(I don't wanna use any generator)

推荐答案

（这解释了暗示的想法 Spudd86 ）。

(This explains the idea hinted by Spudd86).

您需要实现一个有限状态机。主要有以下几种状态：

You need to implement a finite state machine. There are the following states:

常规状态

中的文件名

在＆放大器;＆安培; 标记

在 || 标记

General state
Inside a file name
Inside the && token
Inside the || token

对于每个国家和一个输入字符，你必须决定什么是下一个状态，以及是否要输出的令牌。例如：

For each state and next input character, you have to decide what is the next state, and whether to output a token. For example:

当前状态：常规;人物： X =>下一个状态：里面的文件名

当前状态：里面的文件名;人物：空格 =>下一个状态：常规;输出令牌

当前状态：里面的文件名;人物：＆安培; =>下一个状态：内 - ＆放大器;＆安培; ;输出令牌

当前状态：内 - ＆放大器;＆安培; ;人物：＆安培; =>下一个状态：常规;输出令牌

当前状态：内 - ＆放大器;＆安培; ;人物： X =>下一个状态：常规;语法错误

...（广告nauseum）

Current state: General; character: x => next state: inside-file-name
Current state: inside-file-name; character: space => next state: General; output the token
Current state: inside-file-name; character: & => next state: inside-&&; output the token
Current state: inside-&&; character: & => next state: General; output the token
Current state: inside-&&; character: x => next state: General; syntax error
... (ad nauseum)

这是很多枯燥的工作制定出所有的规则（有趣的开始，当你必须调试产生的code），所以大多数人使用code发电机做到这一点。

It's much boring work to work out all the rules (the fun starts when you must debug the resulting code), so most people use code generators to do that.

编辑：一些code（抱歉，如果语法是混乱的;在C ++中，我通常程序）

some code (sorry if the syntax is messed-up; i usually program in C++)

enum state {
    STATE_GENERAL,
    STATE_IN_FILENAME,
    ...
};

// Many characters are treated the same (e.g. 'x' and 'y') - so use categories
enum character_category
{
    CHAR_GENERAL, // can appear in filenames
    CHAR_WHITESPACE = ' ',
    CHAR_AMPERSAND = '&',
    CHAR_PIPE = '|',
    CHAR_EOF = EOF,
    ...
};

character_category translate(int c)
{
    switch (c) {
    case '&': return CHAR_AMPERSAND;
    case ' ': case '\t': case '\n': return CHAR_WHITESPACE;
    ...
    default: return CHAR_GENERAL;
    }
}

void do_stuff()
{
    character_category cat;
    state current_state = STATE_GENERAL;
    state next_state;
    char token[100];
    char token_length = 0;
    do {
        int c = getchar();
        cat = translate(c);
        // The following implements a switch on 2 variables
        int selector = 1000 * current_state + cat;

        switch (selector)
        {
        case 1000 * STATE_GENERAL + CHAR_GENERAL:
            next_state = STATE_IN_FILENAME;
            token[token_length++] = c; // append a character to a filename token
            break;

        case 1000 * STATE_GENERAL + CHAR_WHITESPACE:
            next_state = STATE_GENERAL; // do nothing
            break;

        case 1000 * STATE_GENERAL + CHAR_PIPE:
            next_state = STATE_IN_OR_TOKEN; // the first char in '||' or just '|'
            break;

        // Much repetitive code already; define a macro for the case constants?
        // Have to cover all states and all character categories; good luck...

        case 1000 * STATE_IN_FILENAME + EOF:
        case 1000 * STATE_IN_FILENAME + CHAR_WHITESPACE:
            next_state = STATE_GENERAL;
            printf("Filename token: %s\n", token);
            break;

        default:
            printf("Bug\n"); // forgot one of the cases?
        }

        current_state = next_state;

    } while (cat != CHAR_EOF);
}

这篇关于如何手工写（壳）词法分析器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何手工写（壳）词法分析器 [英] How to write a (shell) lexer by hand

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

如何手工写（壳）词法分析器 [英] How to write a (shell) lexer by hand

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭