用Boost Spirit解析语法 [英] Parsing a grammar with Boost Spirit

查看:122
本文介绍了用Boost Spirit解析语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析一个C函数,像下面这样的树表达式(使用 Spirit Parser框架):

  F(A(),B(GREAT(SOME,NOT)),C(YES))

为此,我试图对以下语法使用三个规则:

 模板< typename Iterator,typename ExpressionAST> 
struct InputGrammar:qi :: grammar< Iterator,ExpressionAST(),space_type> {

InputGrammar():InputGrammar :: base_type(){
tag =(qi :: char _(a-zA-Z_)> * qi :: char_ a-zA-Z_0-9))[push_back(at_c 0(qi :: _ val),qi ::_1)];
command = tag [at_c< 0>(qi :: _ val)= at_c< 0>(qi :: _1)]> (>>(* instruction>>,)
[push_back(at_c'(qi :: _ val),qi :: _ 1)]&
instruction =(command | tag)[qi :: _ val = qi :: _ 1];
}
qi :: rule<迭代器,ExpressionAST(),space_type>标签;
qi :: rule< Iterator,ExpressionAST(),space_type>命令;
qi :: rule< Iterator,ExpressionAST(),space_type>指令;
};注意,我的标签规则只是试图捕获表达式中使用的标识符('function'名称) )。还要注意,标签规则的签名返回 ExpressionAST 而不是 std :: string ,就像大多数示例。我想这样做的原因其实很简单:我讨厌使用变体,我会避免他们,如果可能的话。



命令应以标签开头(当前节点的名称,第一个字符串字段AST节点)和由括号括起来的可变数量的参数,每个参数可以是标签本身或另一个命令。



但是,此示例不适用于所有。它编译和一切,但在运行时,它无法解析所有我的测试字符串。而真正让我感到烦恼的是,我无法想象如何解决它,因为我不能真正调试上述代码,至少在这个词的传统意义。基本上,我看到我可以修复上述代码的唯一方法是通过知道我做错了。



所以,问题是,我不知道什么是错的用上面的代码。



我使用的 ExpressionAST 类型是:

  struct MockExpressionNode {
std :: string name;
std :: vector< MockExpressionNode>操作数

typedef std :: vector< MockExpressionNode> :: iterator iterator;
typedef std :: vector< MockExpressionNode> :: const_iterator const_iterator;

iterator begin(){return operands.begin(); }
const_iterator begin()const {return operands.begin(); }
iterator end(){return operands.end(); }
const_iterator end()const {return operands.end(); }

bool is_leaf()const {
return(operands.begin()== operands.end());
}
};

BOOST_FUSION_ADAPT_STRUCT(
MockExpressionNode,
(std :: string,name)
(std :: vector< MockExpressionNode>操作数)


解决方案

就调试而言,观看法。这是很困难的,你如何格式化规则。如果你按照精神示例格式化(每行一个解析器,每行一个phoenix语句),断点将更加丰富。



您的数据结构不有一种方法来区分 A() SOME ,因为他们都是叶子(让我知道,如果我缺少某物)。从你的变体注释,我不认为这是你的意图,所以为了区分这两种情况,我添加了一个 bool commandFlag 成员变量MockExpressionNode(true < $ c> A()并且为 SOME 时为false。



具体来说,您需要将启动规则传递给基础构造函数,即:

  InputGrammar :InputGrammar :: base_type(instruction){...} 

这是语法的入口点,这就是为什么你没有得到任何数据解析。我惊讶它编译没有它,我认为语法类型需要匹配第一条规则的类型。即使如此,这是一个方便的惯例。



对于标签规则,实际上有两个解析器 qi :: char_类型为 char * qi :: char _(a-zA-Z _) zA-Z_0-9),其为具有类型(基本上)向量< char> 的_2。它不可能强制这些成一个字符串没有autorules,但它可以通过附加一个规则到每个解析的字符:

  tag = qi :: char _(a-zA-Z_)
[at_c 0(qi :: _ val)= qi ::_1];
>> * qi :: char _(a-zA-Z_0-9)// []优先于*,所以_1是
[at_c< 0>(qi :: _ val)+ = qi :: _1] ; //一个char而不是一个向量< char>

然而,它让更多的精灵让这个转换。因此定义一个新规则:

  qi :: rule< Iterator,std :: string(void),ascii :: space_type>标识符; 
identifier%= qi :: char _(a-zA-Z_)> * qi :: char _(a-zA-Z_0-9);

不要担心它。然后标记变成

  tag = identifier 
[
at_c< 0>(qi :: _ val) qi :: _ 1,
ph :: at_c 2(qi :: _val)= false // commandFlag
]

对于命令,第一部分很好,但是对于(* instruction>>,)[push_back(at_c< 1> (qi :: _ val),qi :: _ 1)] 。这将解析零个或多个指令规则后跟一个,。它也尝试push_back一个向量< MockExpressionNode> (不知道为什么编译,也许没有实例化,因为缺少启动规则?我想你想要下面的(标识符修改):

  command = 
标识符
[
ph :: at_c <0>(qi :: _ val)= qi :: _ 1,
ph :: at_c 2(qi :: _ val)= true // commandFlag
]
>> (
>> - (instruction%,)
[
ph :: at_c 1(qi :: _ val)= qi :: _ 1
]
>>);

这使用可选的运算符 - list运算符,后者等效于 instruction>> *(,>>指令)。然后,phoenix表达式只是将向量直接赋给结构成员,但你也可以直接将动作附加到指令匹配并使用push_back。



指令规则很好,我只是提到它等效于 instruction%=(command | tag)



事实,如果实际上没有区别 A() SOME (即你的原始结构没有 commandFlag ),您只能使用自动参数编写此解析器:

  template< typename Iterator,typename ExpressionAST> 
struct InputGrammar:qi :: grammar< Iterator,ExpressionAST(),ascii :: space_type> {
InputGrammar():InputGrammar :: base_type(command){
identifier%=
qi :: char _(a-zA-Z_)
> * qi :: char _(a-zA-Z_0-9);
command%=
identifier
>> - (

>> - (command%,)
>>));
}
qi :: rule< Iterator,std :: string(void),ascii :: space_type>标识符;
qi :: rule< Iterator,ExpressionAST(void),ascii :: space_type>命令;
};

这是使用融合包裹结构来密切建模输入的最大好处。


I am trying to parse a C-function like tree expressions like the following (using the Spirit Parser Framework):

F( A() , B( GREAT( SOME , NOT ) ) , C( YES ) )

For this I am trying to use the three rules on the following grammar:

template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), space_type> {

    InputGrammar() : InputGrammar::base_type( ) {
       tag = ( qi::char_("a-zA-Z_")  >> *qi::char_("a-zA-Z_0-9") )[ push_back( at_c<0>(qi::_val) , qi::_1 ) ];
       command =  tag [ at_c<0>(qi::_val) = at_c<0>(qi::_1) ] >> "(" >> (*instruction >> ",")
                                        [ push_back( at_c<1>(qi::_val) , qi::_1 ) ]  >> ")";
       instruction = ( command | tag ) [qi::_val = qi::_1];
    }
    qi::rule< Iterator , ExpressionAST() , space_type > tag;
    qi::rule< Iterator , ExpressionAST() , space_type > command;
    qi::rule< Iterator , ExpressionAST() , space_type > instruction;
};

Notice that my tag rule just tries to capture the identifiers used in the expressions (the 'function' names). Also notice that the signature of the tag rule returns a ExpressionAST instead of a std::string, like in most examples. The reason I want to do it like this is actually pretty simple: I hate using variants and I will avoid them if possible. It would be great to keep the cake and eat it too I guess.

A command should start with a tag (the name of the current node, first string field of the AST node) and a variable number of arguments enclosed by parentheses, and each of the arguments can be a tag itself or another command.

However, this example does not work at all. It compiles and everything, but at run time it fails to parse all my test strings. And the thing that really annoys me is that I can't figure how to fix it, since I can't really debug the above code, at least in the traditional meaning of the word. Basically the only way I see I can fix the above code is by knowing what I am doing wrong.

So, the question is that I don't know what is wrong with the above code. How would you define the above grammar?

The ExpressionAST type I am using is:

struct MockExpressionNode {
    std::string name;
    std::vector< MockExpressionNode > operands;

    typedef std::vector< MockExpressionNode >::iterator iterator;
    typedef std::vector< MockExpressionNode >::const_iterator const_iterator;

    iterator begin() { return operands.begin(); }
    const_iterator begin() const { return operands.begin(); }
    iterator end() { return operands.end(); }
    const_iterator end() const { return operands.end(); }

    bool is_leaf() const {
        return ( operands.begin() == operands.end() );
    }
};

BOOST_FUSION_ADAPT_STRUCT(
    MockExpressionNode,
    (std::string, name)
    (std::vector<MockExpressionNode>, operands)
)

解决方案

As far as debugging, its possible to use a normal break and watch approach. This is made difficult by how you've formatted the rules though. If you format per the spirit examples (~one parser per line, one phoenix statement per line), break points will be much more informative.

Your data structure doesn't have a way to distinguish A() from SOME in that they are both leaves (let me know if I'm missing something). From your variant comment, I don't think this was your intention, so to distinguish these two cases, I added a bool commandFlag member variable to MockExpressionNode (true for A() and false for SOME), with a corresponding fusion adapter line.

For the code specifically, you need to pass the start rule to the base constructor, i.e.:

InputGrammar() : InputGrammar::base_type(instruction) {...}

This is the entry point in the grammar, and is why you were not getting any data parsed. I'm surprised it compiled without it, I thought that the grammar type was required to match the type of the first rule. Even so, this is a convenient convention to follow.

For the tag rule, there are actually two parsers qi::char_("a-zA-Z_"), which is _1 with type char and *qi::char_("a-zA-Z_0-9") which is _2 with type (basically) vector<char>. Its not possible to coerce these into a string without autorules, But it can be done by attaching a rule to each parsed char:

tag =   qi::char_("a-zA-Z_")
        [ at_c<0>(qi::_val) = qi::_1 ];
    >> *qi::char_("a-zA-Z_0-9")           //[] has precedence over *, so _1 is 
        [ at_c<0>(qi::_val) += qi::_1 ];  //  a char rather than a vector<char>

However, its much cleaner to let spirit do this conversion. So define a new rule:

qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
identifier %= qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9");

And don't worry about it ;). Then tag becomes

tag = identifier
      [
          at_c<0>(qi::_val) = qi::_1,
          ph::at_c<2>(qi::_val) = false //commandFlag
      ]

For command, the first part is fine, but theres a couple problems with (*instruction >> ",")[ push_back( at_c<1>(qi::_val) , qi::_1 ) ]. This will parse zero or multiple instruction rules followed by a ",". It also attempts to push_back a vector<MockExpressionNode> (not sure why this compiled either, maybe not instantiated because of the missing start rule?). I think you want the following (with the identifier modification):

command =
        identifier
        [
           ph::at_c<0>(qi::_val) = qi::_1, 
           ph::at_c<2>(qi::_val) = true    //commandFlag
        ]
    >>  "("
    >> -(instruction % ",")
        [
           ph::at_c<1>(qi::_val) = qi::_1
        ]
    >>  ")";

This uses the optional operator - and the list operator %, the latter is equivalent to instruction >> *("," >> instruction). The phoenix expression then just assigns the vector directly to the structure member, but you could also attach the action directly to the instruction match and use push_back.

The instruction rule is fine, I'll just mention that it is equivalent to instruction %= (command|tag).

One last thing, if there actually is no distinction between A() and SOME (i.e. your original structure with no commandFlag), you can write this parser using only autorules:

template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), ascii::space_type> {
   InputGrammar() : InputGrammar::base_type( command ) {
      identifier %=
             qi::char_("a-zA-Z_")
         >> *qi::char_("a-zA-Z_0-9");
      command %=
            identifier
         >> -(
            "("
         >> -(command % ",")
         >>  ")");
    }
    qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
    qi::rule< Iterator , ExpressionAST(void) , ascii::space_type > command;
};

This is the big benefit of using a fusion wrapped structure that models the input closely.

这篇关于用Boost Spirit解析语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆