语法树空节点问题与Spirit Qi MiniC的例子 [英] Syntax tree empty nodes issue with Spirit Qi MiniC example

查看:218
本文介绍了语法树空节点问题与Spirit Qi MiniC的例子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我在Spirit Qi中玩过MiniC的例子,并注意到表达式语法中有空AST节点的问题。
它会生成一个表达式节点,它只包含一个类型为expression的单个操作数。



我认为问题是由于递归规则运算符优先级的定义:

  // expression.hpp 
qi :: rule< Iterator,ast :: expression (),船长<迭代器> >
expr,equality_expr,relational_expr,
logical_or_expr,logical_and_expr,
additive_expr,multiplicative_expr
;

// expression_def.hpp
expr =
logical_or_expr.alias()
;

logical_or_expr =
logical_and_expr
>> *(logical_or_op> logical_and_expr)
;

logical_and_expr =
equality_expr
>> *(logical_and_op> equality_expr)
;

// ast.hpp

typedef boost :: variant<
nil
,bool
,unsigned int
,标识符
,boost :: recursive_wrapper< unary>
,boost :: recursive_wrapper< function_call>
,boost :: recursive_wrapper< expression>
>
操作数;
/*...*/
struct expression
{
操作数第一;
std :: list< operation>休息;
};

当logical_or_expr递归到logical_and_expr时,logical_and_expr将返回一个expression()。
由于属性传播是Spirit,logical_or_expr然后将此表达式分配给它的表达式的第一个成员。



我认为这是什么产生所有这些空表达式节点。
我发现,令人讨厌,想摆脱他们,但还没有成功。有没有人找到一个体面的解决方案?



我认为如果一个表达式包含二进制操作和一元操作,这将是某种可能的。
那么这样做还有一个优点,即将操作和操作数保持在相同的结构(伪代码)中:

  struct binary_op 
{
optype type;
操作数op1;
操作数op2;
}

struct unary_op
{
optype type;
操作数op1;
}

struct eval_op
{
操作数op1;
}

typedef boost :: variant< binary_op,
unary_op,
eval_op>
表达式

typedef boost :: variant< int,
bool,
boost :: recursive_wrapper< expression> >
操作数;

但是,我相信在纸上弹出后,我仍然会遇到这个空节点问题。看来我正在追我自己的尾巴。



有没有人想到如何解决这个问题?



编辑

  a> b 

将解析为:

 表达式(来自logical_or_op)//空表达式(仅向前)
(rest) - / \-(first)expression(from logical_and_op)// Empty expression(forward only)
(rest) - / \-(first)expression(from equality_op)// Empty expression(forward only)
(rest) - / \-(first)expression(from relational_op)//正确表达式
(第一)a - / \-(rest)
[0] operator_:op_greater
operand_:b

所需的输出将是:

 表达式(来自relational_op) 
(first)a - / \-(rest)
[0] operator_:op_greater
operand_:b

EDIT2



我已经上传了一个修改的mini-c版本,用于打印表达式生成的表达式树:



链接



如果您查看包含的A.avxh文件,它包含一个要解析的表达式。
在第67行的main.cpp中设置一个断点,并观察它在多少次递归的步骤。

解决方案

这是因为所有的规则暴露了一个原始 ast :: expression ,这是一个固定类型。



在这个示例中,它显然是简单的选择:利益是




  • 通过使所有表达式节点相同的类型,树访问代码可以简单而统一。

  • 所有规则都具有相同的签名,并遵循相同的模式。

  • 在这个特殊的例子(mini_c)中,代码阶段通过继承同样的简单来获益。



通过更紧密地遵循语义 的更灵活的AST的通常方式是使表达式一个变体:这样,每个表达式可以直接包含实际的具体子表达式,而不是通过中间级别的节点类型wading,只是模仿语法结构而不是语义


我想我有几个这样的asts和相应的例子这个网站上的语法,会看到我是否可以连接一个。



实际上,我认为 typedef变种< ...>在同一个例子( ast.hpp )中的语句不是这种方法的一个坏例子。



相关链接:




现在,如果你不想改变语法因为不要简单,您可以改为对AST进行转换(可以这么简单的表达式)。



我将在下一个小时内看看我能想出的东西。



我刚刚重构了语法,因此不会导致如此深入的嵌套。


  1. 首先,让我们将测试减少到一个独立的测试台,只需解析表达式,并显示一个简单的表达式(42)解析到深层嵌套的AST: http ://coliru.stacked-crooked.com/a/5467ca41b0ac1d03

     < expr> 
    ...
    < success>< / success>
    < attributes> [] [[[[[]],[]],[]],[]],[]],[]]]< / attributes>
    < / expr>


  2. 接下来,我们来删除根问题:语法返回不变类型( ast :: expression ),这在很多情况下太重了。相反,我们想返回一个 ast :: operand (这是变体,而可以包含相同的 ast ::表达式节点): http://coliru.stacked-crooked .com / a / 00e43b1f61db018c


  3. 最后,我们希望所有规则都成为变体,并返回 一个表达式 iff有拖尾操作,或只是一个单独的操作数在另一种情况下,在伪代码中:

      logical_or_expr = 
    (logical_and_expr>> + logical_or_op> logical_and_expr)
    | logical_and_expr
    ;

    注意如果 +(...)而不是 *(...)要求至少一个尾随的逻辑操作



    现在,Spirit必须被告知如何将第一个分支分配给操作数属性。应该是这里的修补程序,但遗憾的是我遇到了未定义的行为(这花费了最多的时间)。我定了一个更详细的解决方案:

      _logical_or_expr = logical_and_expr>> +(logical_or_op> logical_and_expr); 
    logical_or_expr = _logical_or_expr | logical_and_expr;

    正如你可能看到的,它是一样的,但第一个分支提取到一个单独的规则,所以我们可以声明规则来公开所需的属性:

      qi :: rule< It,ast :: operand ),Skipper> logical_or_expr; 
    qi :: rule< It,ast :: expression(),Skipper> _logical_or_expr;

    确实为每个优先级别的子表达式做这个,结果如下:

      _logical_or_expr = logical_and_expr>> +(logical_or_op> logical_and_expr); 
    _multiplicative_expr = unary_expr>> *(multiplicative_op> unary_expr);
    _additive_expr = multiplicative_expr>> +(additive_op> multiplicative_expr);
    _relational_expr = additive_expr>> +(relational_op> additive_expr);
    _equality_expr = relational_expr>> +(equality_op> relational_expr);
    _logical_and_expr = equality_expr>> +(logical_and_op> equality_expr);

    logical_or_expr = _logical_or_expr | logical_and_expr;
    logical_and_expr = _logical_and_expr | equal_expr;
    equality_expr = _equality_expr | relational_expr;
    relational_expr = _relational_expr | additive_expr;
    additive_expr = _additive_expr |乘法_expr;
    multiplicative_expr = _multiplicative_expr | attr_cast< ast :: operand,ast :: operand> (unary_expr);


最终结果在这里: http://coliru.stacked-crooked.com/a/8539757bb02fca34 (可惜这只是太非常适用于Coliru),并打印:

 < expr> 
...
< / logical_or_expr>
< success>< / success>
< attributes> [[42,[]]]< / attributes>
< / expr>

CAVEAT 请注意,此修改将 使解析器更有效率!事实上,它只会导致一系列的回溯(调试输出计数在925行,而不是在步骤1 (!!)中的45)。



现在,将有一些优化使用先行断言和/或语义动作的空间,但一般来说,您将不得不考虑对AST存储的优化将花费CPU时间。


Dear Spirit Qi experts.

I have played around with the MiniC example in Spirit Qi, and have noticed an issue with "empty" AST nodes in the expression grammar. It will generate "expression" nodes that hold nothing but a single operand of type "expression" again.

I think the issue is due to the recursive rule definitions for the operator precedence:

// expression.hpp
    qi::rule<Iterator, ast::expression(), skipper<Iterator> >
        expr, equality_expr, relational_expr,
        logical_or_expr, logical_and_expr,
        additive_expr, multiplicative_expr
        ;

// expression_def.hpp
    expr =
        logical_or_expr.alias()
        ;

    logical_or_expr =
            logical_and_expr
        >> *(logical_or_op > logical_and_expr)
        ;

    logical_and_expr =
            equality_expr
        >> *(logical_and_op > equality_expr)
        ;

// ast.hpp

typedef boost::variant<
        nil
      , bool
      , unsigned int
      , identifier
      , boost::recursive_wrapper<unary>
      , boost::recursive_wrapper<function_call>
      , boost::recursive_wrapper<expression>
    >
operand;
/*...*/
struct expression
{
    operand first;
    std::list<operation> rest;
};

When logical_or_expr recurses into logical_and_expr, logical_and_expr will return an expression(). Due to the attribute propagation is Spirit, logical_or_expr then assigns this expression to it's expression's "first" member.

I think this is what generates all these "empty" expression nodes. I find that, however, nasty, and would like to get rid of them, but have not yet been successful. Has anyone found a decent solution to this before?

I'm thinking it would be somehow possible if an expression consisted of binary ops and unary ops. This would then also have the advantage of keeping the operation and operands in the same structure (pseudo-code):

struct binary_op
{
    optype type;
    operand op1;
    operand op2;
}

struct unary_op
{
    optype type;
    operand op1;
}

struct eval_op
{
    operand op1;
}

typedef boost::variant<binary_op,
                       unary_op,
                       eval_op>
expression;

typedef boost::variant<int,
                       bool,
                       boost::recursive_wrapper<expression> >
operand;

However, I believe I would still encounter this "empty node" issue after playing this out on paper. It seems like I'm chasing my own tail.

Does anyone have an idea on how to approach this issue?

EDIT

a > b

Will parse to:

    expression (from logical_or_op) // Empty expression (forward only)
(rest)  -/  \- (first) expression (from logical_and_op) // Empty expression (forward only)
            (rest)  -/  \- (first) expression (from equality_op) // Empty expression (forward only)
                    (rest)  -/ \- (first) expression (from relational_op) // "Correct" expression
                            (first) a   -/      \- (rest)
                                                    [0] operator_: op_greater
                                                        operand_: b

The desired output would be:

            expression (from relational_op)
(first) a   -/      \- (rest)
                        [0] operator_: op_greater
                            operand_: b 

EDIT2

I have uploaded a modified mini-c version that prints the generated expression tree for an expression:

Link

If you look at the included A.avxh file, it contains an expression to parse. Set a breakpoint in main.cpp at line 67, and observe how often it recursively steps in there.

解决方案

This is because all the rules expose a "raw" ast::expression and this is a fixed type.

It has clearly been a choice for simplicity in this sample: the benefits are

  • By making all expression nodes the same type, the tree visitation code can be simple and uniform.
  • All rules have the same 'signature' and follow the same pattern. It's easy to reason about.
  • In this particular example (mini_c) the code-gen phase benefits by inheriting the same simplicity.

The usual way to have more flexible AST that follows the semantics more closely would be by making expression a variant instead: that way each expression can directly contain the actual "concrete" subexpression type instead of "wading" through intermediate levels of node types, just to mimic the grammar structure instead of the semantics.

I think I have several examples of such asts and corresponding grammars on this site, will see if I can link one later.

In fact, I think the typedef variant<...> statement in the same example (ast.hpp) is not a bad example of this approach.

Relevant links:

For now, if you don't wish to alter the grammar (so as not to "lose" the simplicity) you could instead do a transformation on the AST (a "simplify" pass on the expressions, so to speak).

I'm going to see what I can come up with in the next hour.

I've just refactored the grammar so that it doesn't result in such deep nesting.

  1. First, let's reduce the test to a standalone test bed that just parses expressions and shows how a simple expression ("42") parses to a deeply nested AST: http://coliru.stacked-crooked.com/a/5467ca41b0ac1d03

    <expr>
      ...
      <success></success>
      <attributes>[[[[[[[42, []], []], []], []], []], []]]</attributes>
    </expr>
    

  2. Next, let's remove the root problem: the grammar returns an invariant type (ast::expression) which is too heavy in many cases. Instead, we'd like to return an ast::operand (which is variant, and can contain the same ast::expression node): http://coliru.stacked-crooked.com/a/00e43b1f61db018c

  3. Lastly we'd like all rules to become variant as well and returning either an expression iff there are trailing operations, or just a lone operand in the other case, in pseudo code:

    logical_or_expr = 
          (logical_and_expr >> +(logical_or_op > logical_and_expr)
        | logical_and_expr
        ;
    

    Note the subtle use if +(...) instead of *(...) to mandate at least one trailing logical_or operation

    Now, Spirit will have to be told how to assign the first branch to an operand attribute. qi::attr_cast<ast::expression>(...) should have been the fix here, but sadly I ran into undefined behaviour (this took the most time). I settled for a more verbose solution:

    _logical_or_expr = logical_and_expr    >> +(logical_or_op     > logical_and_expr) ;
    logical_or_expr  = _logical_or_expr | logical_and_expr;
    

    As you can probably see, it's just the same, but with the first branch extracted to a separate rule, so we can just declare the rules to expose the desired attributes:

    qi::rule<It, ast::operand(),    Skipper> logical_or_expr;
    qi::rule<It, ast::expression(), Skipper> _logical_or_expr;
    

    Indeed doing this for each precedence level of subexpressions, results in the following:

    _logical_or_expr     = logical_and_expr    >> +(logical_or_op     > logical_and_expr) ;
    _multiplicative_expr = unary_expr          >> *(multiplicative_op > unary_expr) ;
    _additive_expr       = multiplicative_expr >> +(additive_op       > multiplicative_expr) ;
    _relational_expr     = additive_expr       >> +(relational_op     > additive_expr) ;
    _equality_expr       = relational_expr     >> +(equality_op       > relational_expr) ;
    _logical_and_expr    = equality_expr       >> +(logical_and_op    > equality_expr) ;
    
    logical_or_expr     = _logical_or_expr     | logical_and_expr        ;
    logical_and_expr    = _logical_and_expr    | equality_expr           ;
    equality_expr       = _equality_expr       | relational_expr         ;
    relational_expr     = _relational_expr     | additive_expr           ;
    additive_expr       = _additive_expr       | multiplicative_expr     ;
    multiplicative_expr = _multiplicative_expr | attr_cast<ast::operand, ast::operand> (unary_expr) ;
    

The end result is here: http://coliru.stacked-crooked.com/a/8539757bb02fca34 (sadly it is just too much for Coliru), and it prints:

<expr>
  ...
  </logical_or_expr>
  <success></success>
  <attributes>[[42, []]]</attributes>
</expr>

CAVEAT Note that this adaptation will NOT make the parser more efficient! In fact, it will just result in a boatload of backtracking (the debug output counts 925 lines instead of just 45 in Step 1 (!!)).

Now, there will be some room to optimize using look-ahead assertions and/or semantic actions, but in general you will have to consider that optimizing for AST storage is going to cost CPU time.

这篇关于语法树空节点问题与Spirit Qi MiniC的例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆