如何避免定义与boost :: spirit :: lex中的所有内容匹配的令牌 [英] how to avoid defining token which matchs everything in boost::spirit::lex

查看:74
本文介绍了如何避免定义与boost :: spirit :: lex中的所有内容匹配的令牌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个语法和词法分析器来解析以下字符串:

I want to create a grammar and lexer to parse the below string:

100 reason phrase

正则表达式将为:"\ d {3} [^ \ r \ n] *"

regular expression will be: "\d{3} [^\r\n]*"

令牌定义:

template <typename Lexer>
struct custom_tokens : lex::lexer<Lexer>
{
    custom_tokens()
    {
        this->self.add_pattern
            ("STATUSCODE", "\\d{3}")                
            ("SP", " ")
            ("REASONPHRASE", "[^\r\n]*")
            ;                

        this->self.add                          
            ("{STATUSCODE}", T_STATUSCODE)
            ("{SP}", T_SP)
            ("{REASONPHRASE}", T_REASONPHRASE)
            ;
    }   
};

语法:

template <typename Iterator>
struct custom_grammar : qi::grammar<Iterator >
{
    template <typename TokenDef>
    custom_grammar(TokenDef const& tok)
        : custom_grammar::base_type(start)            
    {            
        start = (qi::token(T_STATUSCODE) >> qi::token(T_SP) >> qi::token(T_REASONPHRASE));
    }

    qi::rule<Iterator> start;
};

但是,我意识到我无法定义令牌"T_REASONPHRASE",因为它会匹配包括"T_STATUSCODE"在内的所有内容.我能做的就是

however, I realized that I couldn't define token "T_REASONPHRASE" because it will match everything including "T_STATUSCODE". what I can do is

  1. 未定义T_REASONPHRASE并使用qi :: lexeme在custom_grammar中编写规则?

  1. undefine T_REASONPHRASE and use qi::lexeme to write a rule inside custom_grammar?

我可以使用lex state来做到这一点吗?例如在第二状态下定义"T_REASONPHRASE",如果看到T_STATUSCODE在第一状态下,则将其余部分解析为第二状态?请举个例子吗?

can I use lex state to do that? e.g. define "T_REASONPHRASE" in second state, if it sees T_STATUSCODE in first state then parse the rest to second state? please give an example?

推荐答案

我不认为这确实存在问题,因为令牌是贪婪地"匹配的,其顺序是它们被添加到令牌定义中的顺序(用于特定的词法分析器状态).

I don't think there really is a problem, because tokens are 'greedily' matched in the order they've been added to the token definitions (for a specific lexer state).

所以,给定

    this->self.add                          
        ("{STATUSCODE}", T_STATUSCODE)
        ("{SP}", T_SP)
        ("{REASONPHRASE}", T_REASONPHRASE)
        ;

T_STATUSCODE总是 匹配,直到T_REASONPHRASE(如果有歧义)

T_STATUSCODE will always match before T_REASONPHRASE (if there is an ambiguity at all).

关于使用单独的Lexer状态,这是我曾经在一个玩具项目中使用过的分词器的摘录:

About using separate Lexer states, here's an excerpt of a tokenizer I once had in a toy project:

this->self = fileheader     [ lex::_state = "GT" ];

this->self("GT") =
    gametype_label |
    gametype_63000 | gametype_63001 | gametype_63002 |
    gametype_63003 | gametype_63004 | gametype_63005 |
    gametype_63006 |
    gametype_eol            [ lex::_state = "ML" ];

this->self("ML") = mvnumber [ lex::_state = "MV" ];

this->self("MV") = piece | field | op | check | CASTLEK | CASTLEQ 
         | promotion
         | Checkmate | Stalemate | EnPassant
         | eol              [ lex::_state = "ML" ]
         | space            [ lex::_pass = lex::pass_flags::pass_ignore ];

(如果您将GT理解为 gametype ML: move line MV: move ;请注意在这里eolgametype_eol的存在:Lex不允许将相同的令牌添加到不同的状态)

(The purpose would be relatively clear if you read GT as gametype, ML: move line and MV: move; Note the presence of eol and gametype_eol here: Lex disallows adding the same token to different states)

这篇关于如何避免定义与boost :: spirit :: lex中的所有内容匹配的令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆