语法树空节点问题与Spirit Qi MiniC示例 [英] Syntax tree empty nodes issue with Spirit Qi MiniC example

查看:170
本文介绍了语法树空节点问题与Spirit Qi MiniC示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

亲爱的Spirit Qi专家。



我在Spirit Qi中使用了MiniC示例,并注意到表达式语法中的空AST节点的问题。
它将生成表达式节点,它们只保存一个类型为表达式的单个操作数。



我认为问题是由于递归规则运算符优先级的定义:

  // expression.hpp 
qi :: rule< Iterator,ast :: expression (),skipper< Iterator> >
expr,equality_expr,relational_expr,
logical_or_expr,logical_and_expr,
additive_expr,multiplicative_expr
;

// expression_def.hpp
expr =
logical_or_expr.alias()
;

logical_or_expr =
logical_and_expr
>> *(logical_or_op> logical_and_expr)
;

logical_and_expr =
equality_expr
>> *(logical_and_op> equality_expr)
;

// ast.hpp

typedef boost :: variant<
nil
,bool
,unsigned int
,identifier
,boost :: recursive_wrapper< unary>
,boost :: recursive_wrapper< function_call>
,boost :: recursive_wrapper< expression>
>
operand;
/*...*/
结构表达式
{
操作数第一;
std :: list< operation>休息;
};

当logical_or_expr递归为logical_and_expr时,logical_and_expr将返回一个表达式
由于属性传播是Spirit,logical_or_expr然后将这个表达式赋给它的表达式的第一成员。



我认为这是生成所有这些 表达式节点。
我发现,然而,讨厌,并希望摆脱他们,但还没有成功。有没有人找到一个像样的解决方案之前?



我想,如果一个表达式包含二进制操作和一元操作将是不可能的。
然后,这也将具有将操作和操作数保持在同一结构(伪代码)的优点:

  struct binary_op 
{
optype type;
operand op1;
operand op2;
}

struct unary_op
{
optype type;
operand op1;
}

struct eval_op
{
操作数op1;
}

typedef boost :: variant< binary_op,
unary_op,
eval_op>
expression;

typedef boost :: variant< int,
bool,
boost :: recursive_wrapper< expression> >
operand;

但是,我相信我会在纸上玩这个空节点问题。



有没有人知道如何解决这个问题?



EDIT

  b 

将解析为:

 表达式(来自logical_or_op)//空表达式(仅向前)
(休息) - / \-(第一)表达式(来自logical_and_op)//空表达式
(rest) - / \-(first)表达式(from relation_op)//空表达式正确的表达式
(第一)a - / \-(休息)
[0] operator_:op_greater
operand_:b

所需的输出是:

  
(first)a - / \-(rest)
[0] operator_:op_greater
operand_:b

EDIT2



我已上传修改的mini-c版本,打印生成的表达式树表达式:



链接



如果您查看包含的A.avxh文件,它包含一个要解析的表达式。
在main.cpp中的第67行设置一个断点,并观察它在多长时间内递归进入。

解决方案

这是因为所有规则暴露了一个原始 ast :: expression ,这是一个固定类型。



在这个示例中显然是一个简单的选择:好处是




  • 通过使所有表达式节点具有相同的类型,树访问代码

  • 所有规则都具有相同的签名,并遵循相同的模式。

$ b 在此特定示例(mini_c)中,code-gen阶段通过继承相同的简单性受益。
$ b

通过使表达式更接近语义 code>一个变体:每个表达式可以直接包含实际的具体子表达式类型,而不是wading通过中间级别的节点类型,只是模仿语法结构



我想我有几个这样的asts和对应的例子语法在这个网站,将看看我是否可以链接一个。



其实,我认为 typedef variant< ...&在同一个例子中,语句 ast.hpp )不是这种方法的一个坏例子。



相关链接:




  • Boost :: Spirit表达式解析器

  • Boost :: spirit如何解析和调用c ++函数式表达式
    这是一个问题,我通过提供一个功能齐全的表达式解析器(具有内置函数求值)来回答,有两种方式:




    • 具有完整AST的响铃和口哨方法(类似于我们在这里使用mini_c示例表达式)

    • 一种务实的方法,只需使用语义

    • ul>

      现在,如果你不想改变语法(为了不失去简单性),你可以做一个转换AST(对表达式的简化传递)。



      我要看看我能在下一个小时拿到什么。



      我刚刚重构了语法,因此它不会导致这么深的嵌套。


      1. 首先,让我们将测试简化为一个独立的测试床,只需解析表达式并显示一个简单的表达式$ c>42)解析深层嵌套的AST: http: //coliru.stacked-crooked.com/a/5467ca41b0ac1d03

         < expr> 
        ...
        < success>< / success>
        < attributes> [[[[[[[[42,[]],[]],[]],[]]
        < / expr>


      2. 接下来,删除根问题:语法返回不变类型$ c> ast :: expression ),在很多情况下太重了。相反,我们想返回 ast :: operand (这是变体,可以包含相同的 ast :: expression 节点): http://coliru.stacked-crooked。 com / a / 00e43b1f61db018c


      3. 最后,我们希望所有规则都变成 > > 表示如果有尾随操作,或只是一个孤立的操作数在另一种情况下,在伪代码中:

          logical_or_expr = 
        (logical_and_expr> + > logical_and_expr)
        | logical_and_expr
        ;

        注意的细微使用,如果 +(...)而不是 *(...)要求至少一个尾随的 logical_or操作



        现在,Spirit必须告诉如何将第一个分支分配给 operand 属性。 qi :: attr_cast< ast :: expression>(...)应该是这里的修复,但可惜我碰到未定义的行为(这花了最多的时间)。我决定了一个更详细的解决方案:

          _logical_or_expr = logical_and_expr>> +(logical_or_op> logical_and_expr); 
        logical_or_expr = _logical_or_expr | logical_and_expr;

        你可能看到,它是一样的,但第一个分支提取到一个单独的规则,因此我们只需声明规则以暴露所需的属性:

          qi :: rule< It,ast :: operand ),Skipper> logical_or_expr; 
        qi :: rule< It,ast :: expression(),Skipper> _logical_or_expr;

        确实对子表达式的每个优先级执行此操作,结果如下:

          _logical_or_expr = logical_and_expr>> +(logical_or_op> logical_and_expr); 
        _multiplicative_expr = unary_expr>> *(multiplicative_op> unary_expr);
        _additive_expr = multiplicative_expr>> +(additive_op> multiplicative_expr);
        _relational_expr = additive_expr>> +(relational_op> additive_expr);
        _equality_expr = relational_expr>> +(equality_op> relational_expr);
        _logical_and_expr = equality_expr>> +(logical_and_op> equality_expr);

        logical_or_expr = _logical_or_expr | logical_and_expr;
        logical_and_expr = _logical_and_expr | equality_expr;
        equality_expr = _equality_expr |关系表达式
        relational_expr = _relational_expr | additive_expr;
        additive_expr = _additive_expr | multiplicative_expr;
        multiplicative_expr = _multiplicative_expr | attr_cast< ast :: operand,ast :: operand> (unary_expr);


      最终结果是: http://coliru.stacked-crooked.com/a/8539757bb02fca34 (可惜这只是太多了对于Coliru),它打印:

       < expr> 
      ...
      < / logical_or_expr>
      < success>< / success>
      < attributes> [[42,[]]]< / attributes>
      < / expr>

      CAVEAT 请注意,此修改将 / em> 使解析器更高效!事实上,它只会导致一堆回溯(调试输出在第1步(!!))中计数为925行,而不是45行。



      现在,使用前瞻性断言和/或语义动作会有一些优化空间,但一般来说,你必须考虑AST存储的优化会花费CPU时间。


      Dear Spirit Qi experts.

      I have played around with the MiniC example in Spirit Qi, and have noticed an issue with "empty" AST nodes in the expression grammar. It will generate "expression" nodes that hold nothing but a single operand of type "expression" again.

      I think the issue is due to the recursive rule definitions for the operator precedence:

      // expression.hpp
          qi::rule<Iterator, ast::expression(), skipper<Iterator> >
              expr, equality_expr, relational_expr,
              logical_or_expr, logical_and_expr,
              additive_expr, multiplicative_expr
              ;
      
      // expression_def.hpp
          expr =
              logical_or_expr.alias()
              ;
      
          logical_or_expr =
                  logical_and_expr
              >> *(logical_or_op > logical_and_expr)
              ;
      
          logical_and_expr =
                  equality_expr
              >> *(logical_and_op > equality_expr)
              ;
      
      // ast.hpp
      
      typedef boost::variant<
              nil
            , bool
            , unsigned int
            , identifier
            , boost::recursive_wrapper<unary>
            , boost::recursive_wrapper<function_call>
            , boost::recursive_wrapper<expression>
          >
      operand;
      /*...*/
      struct expression
      {
          operand first;
          std::list<operation> rest;
      };
      

      When logical_or_expr recurses into logical_and_expr, logical_and_expr will return an expression(). Due to the attribute propagation is Spirit, logical_or_expr then assigns this expression to it's expression's "first" member.

      I think this is what generates all these "empty" expression nodes. I find that, however, nasty, and would like to get rid of them, but have not yet been successful. Has anyone found a decent solution to this before?

      I'm thinking it would be somehow possible if an expression consisted of binary ops and unary ops. This would then also have the advantage of keeping the operation and operands in the same structure (pseudo-code):

      struct binary_op
      {
          optype type;
          operand op1;
          operand op2;
      }
      
      struct unary_op
      {
          optype type;
          operand op1;
      }
      
      struct eval_op
      {
          operand op1;
      }
      
      typedef boost::variant<binary_op,
                             unary_op,
                             eval_op>
      expression;
      
      typedef boost::variant<int,
                             bool,
                             boost::recursive_wrapper<expression> >
      operand;
      

      However, I believe I would still encounter this "empty node" issue after playing this out on paper. It seems like I'm chasing my own tail.

      Does anyone have an idea on how to approach this issue?

      EDIT

      a > b
      

      Will parse to:

          expression (from logical_or_op) // Empty expression (forward only)
      (rest)  -/  \- (first) expression (from logical_and_op) // Empty expression (forward only)
                  (rest)  -/  \- (first) expression (from equality_op) // Empty expression (forward only)
                          (rest)  -/ \- (first) expression (from relational_op) // "Correct" expression
                                  (first) a   -/      \- (rest)
                                                          [0] operator_: op_greater
                                                              operand_: b
      

      The desired output would be:

                  expression (from relational_op)
      (first) a   -/      \- (rest)
                              [0] operator_: op_greater
                                  operand_: b 
      

      EDIT2

      I have uploaded a modified mini-c version that prints the generated expression tree for an expression:

      Link

      If you look at the included A.avxh file, it contains an expression to parse. Set a breakpoint in main.cpp at line 67, and observe how often it recursively steps in there.

      解决方案

      This is because all the rules expose a "raw" ast::expression and this is a fixed type.

      It has clearly been a choice for simplicity in this sample: the benefits are

      • By making all expression nodes the same type, the tree visitation code can be simple and uniform.
      • All rules have the same 'signature' and follow the same pattern. It's easy to reason about.
      • In this particular example (mini_c) the code-gen phase benefits by inheriting the same simplicity.

      The usual way to have more flexible AST that follows the semantics more closely would be by making expression a variant instead: that way each expression can directly contain the actual "concrete" subexpression type instead of "wading" through intermediate levels of node types, just to mimic the grammar structure instead of the semantics.

      I think I have several examples of such asts and corresponding grammars on this site, will see if I can link one later.

      In fact, I think the typedef variant<...> statement in the same example (ast.hpp) is not a bad example of this approach.

      Relevant links:

      For now, if you don't wish to alter the grammar (so as not to "lose" the simplicity) you could instead do a transformation on the AST (a "simplify" pass on the expressions, so to speak).

      I'm going to see what I can come up with in the next hour.

      I've just refactored the grammar so that it doesn't result in such deep nesting.

      1. First, let's reduce the test to a standalone test bed that just parses expressions and shows how a simple expression ("42") parses to a deeply nested AST: http://coliru.stacked-crooked.com/a/5467ca41b0ac1d03

        <expr>
          ...
          <success></success>
          <attributes>[[[[[[[42, []], []], []], []], []], []]]</attributes>
        </expr>
        

      2. Next, let's remove the root problem: the grammar returns an invariant type (ast::expression) which is too heavy in many cases. Instead, we'd like to return an ast::operand (which is variant, and can contain the same ast::expression node): http://coliru.stacked-crooked.com/a/00e43b1f61db018c

      3. Lastly we'd like all rules to become variant as well and returning either an expression iff there are trailing operations, or just a lone operand in the other case, in pseudo code:

        logical_or_expr = 
              (logical_and_expr >> +(logical_or_op > logical_and_expr)
            | logical_and_expr
            ;
        

        Note the subtle use if +(...) instead of *(...) to mandate at least one trailing logical_or operation

        Now, Spirit will have to be told how to assign the first branch to an operand attribute. qi::attr_cast<ast::expression>(...) should have been the fix here, but sadly I ran into undefined behaviour (this took the most time). I settled for a more verbose solution:

        _logical_or_expr = logical_and_expr    >> +(logical_or_op     > logical_and_expr) ;
        logical_or_expr  = _logical_or_expr | logical_and_expr;
        

        As you can probably see, it's just the same, but with the first branch extracted to a separate rule, so we can just declare the rules to expose the desired attributes:

        qi::rule<It, ast::operand(),    Skipper> logical_or_expr;
        qi::rule<It, ast::expression(), Skipper> _logical_or_expr;
        

        Indeed doing this for each precedence level of subexpressions, results in the following:

        _logical_or_expr     = logical_and_expr    >> +(logical_or_op     > logical_and_expr) ;
        _multiplicative_expr = unary_expr          >> *(multiplicative_op > unary_expr) ;
        _additive_expr       = multiplicative_expr >> +(additive_op       > multiplicative_expr) ;
        _relational_expr     = additive_expr       >> +(relational_op     > additive_expr) ;
        _equality_expr       = relational_expr     >> +(equality_op       > relational_expr) ;
        _logical_and_expr    = equality_expr       >> +(logical_and_op    > equality_expr) ;
        
        logical_or_expr     = _logical_or_expr     | logical_and_expr        ;
        logical_and_expr    = _logical_and_expr    | equality_expr           ;
        equality_expr       = _equality_expr       | relational_expr         ;
        relational_expr     = _relational_expr     | additive_expr           ;
        additive_expr       = _additive_expr       | multiplicative_expr     ;
        multiplicative_expr = _multiplicative_expr | attr_cast<ast::operand, ast::operand> (unary_expr) ;
        

      The end result is here: http://coliru.stacked-crooked.com/a/8539757bb02fca34 (sadly it is just too much for Coliru), and it prints:

      <expr>
        ...
        </logical_or_expr>
        <success></success>
        <attributes>[[42, []]]</attributes>
      </expr>
      

      CAVEAT Note that this adaptation will NOT make the parser more efficient! In fact, it will just result in a boatload of backtracking (the debug output counts 925 lines instead of just 45 in Step 1 (!!)).

      Now, there will be some room to optimize using look-ahead assertions and/or semantic actions, but in general you will have to consider that optimizing for AST storage is going to cost CPU time.

      这篇关于语法树空节点问题与Spirit Qi MiniC示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆