语法树空节点问题与Spirit Qi MiniC的例子 [英] Syntax tree empty nodes issue with Spirit Qi MiniC example
问题描述
我在Spirit Qi中玩过MiniC的例子,并注意到表达式语法中有空AST节点的问题。
它会生成一个表达式节点,它只包含一个类型为expression的单个操作数。
我认为问题是由于递归规则运算符优先级的定义:
// expression.hpp
qi :: rule< Iterator,ast :: expression (),船长<迭代器> >
expr,equality_expr,relational_expr,
logical_or_expr,logical_and_expr,
additive_expr,multiplicative_expr
;
// expression_def.hpp
expr =
logical_or_expr.alias()
;
logical_or_expr =
logical_and_expr
>> *(logical_or_op> logical_and_expr)
;
logical_and_expr =
equality_expr
>> *(logical_and_op> equality_expr)
;
// ast.hpp
typedef boost :: variant<
nil
,bool
,unsigned int
,标识符
,boost :: recursive_wrapper< unary>
,boost :: recursive_wrapper< function_call>
,boost :: recursive_wrapper< expression>
>
操作数;
/*...*/
struct expression
{
操作数第一;
std :: list< operation>休息;
};
当logical_or_expr递归到logical_and_expr时,logical_and_expr将返回一个expression()。
由于属性传播是Spirit,logical_or_expr然后将此表达式分配给它的表达式的第一个成员。
我认为这是什么产生所有这些空表达式节点。
我发现,令人讨厌,想摆脱他们,但还没有成功。有没有人找到一个体面的解决方案?
我认为如果一个表达式包含二进制操作和一元操作,这将是某种可能的。
那么这样做还有一个优点,即将操作和操作数保持在相同的结构(伪代码)中:
struct binary_op
{
optype type;
操作数op1;
操作数op2;
}
struct unary_op
{
optype type;
操作数op1;
}
struct eval_op
{
操作数op1;
}
typedef boost :: variant< binary_op,
unary_op,
eval_op>
表达式
typedef boost :: variant< int,
bool,
boost :: recursive_wrapper< expression> >
操作数;
但是,我相信在纸上弹出后,我仍然会遇到这个空节点问题。看来我正在追我自己的尾巴。
有没有人想到如何解决这个问题?
编辑
a> b
将解析为:
表达式(来自logical_or_op)//空表达式(仅向前)
(rest) - / \-(first)expression(from logical_and_op)// Empty expression(forward only)
(rest) - / \-(first)expression(from equality_op)// Empty expression(forward only)
(rest) - / \-(first)expression(from relational_op)//正确表达式
(第一)a - / \-(rest)
[0] operator_:op_greater
operand_:b
所需的输出将是:
表达式(来自relational_op)
(first)a - / \-(rest)
[0] operator_:op_greater
operand_:b
EDIT2
我已经上传了一个修改的mini-c版本,用于打印表达式生成的表达式树:
如果您查看包含的A.avxh文件,它包含一个要解析的表达式。
在第67行的main.cpp中设置一个断点,并观察它在多少次递归的步骤。
这是因为所有的规则暴露了一个原始 ast :: expression
,这是一个固定类型。
在这个示例中,它显然是简单的选择:利益是
- 通过使所有表达式节点相同的类型,树访问代码可以简单而统一。
- 所有规则都具有相同的签名,并遵循相同的模式。
- 在这个特殊的例子(mini_c)中,代码阶段通过继承同样的简单来获益。
通过更紧密地遵循语义 的更灵活的AST的通常方式是使表达式
一个变体:这样,每个表达式可以直接包含实际的具体子表达式,而不是通过中间级别的节点类型wading,只是模仿语法结构而不是语义 。
我想我有几个这样的asts和相应的例子这个网站上的语法,会看到我是否可以连接一个。
实际上,我认为
typedef变种< ...>在同一个例子(
不是这种方法的一个坏例子。ast.hpp
)中的语句
相关链接:
- Boost ::精神表达解析器
Boost :: spirit如何解析和调用c ++函数式表达式$
这是一个问题,我通过提供一个全功能的表达式解析器(使用内置函数评估)回答,有两种方式:
- 具有完整AST的钟声和口哨方法(类似于我们在这里使用的mini_c示例表达式)
- 务实的审批ach,使用只是语义动作 进行即时评估 - 注意这是最佳的存储,但有一些缺点,我在那篇文章中讨论
现在,如果你不想改变语法因为不要简单,您可以改为对AST进行转换(可以这么简单的表达式)。
我将在下一个小时内看看我能想出的东西。
我刚刚重构了语法,因此不会导致如此深入的嵌套。
-
首先,让我们将测试减少到一个独立的测试台,只需解析表达式,并显示一个简单的表达式(
42
)解析到深层嵌套的AST: http ://coliru.stacked-crooked.com/a/5467ca41b0ac1d03< expr>
...
< success>< / success>
< attributes> [] [[[[[]],[]],[]],[]],[]],[]]]< / attributes>
< / expr>
-
接下来,我们来删除根问题:语法返回不变类型(
ast :: expression
),这在很多情况下太重了。相反,我们想返回一个ast :: operand
(这是变体,而可以包含相同的ast ::表达式
节点): http://coliru.stacked-crooked .com / a / 00e43b1f61db018c -
最后,我们希望所有规则都成为变体,并返回 一个
表达式
iff有拖尾操作,或只是一个单独的操作数
在另一种情况下,在伪代码中:logical_or_expr =
(logical_and_expr>> + logical_or_op> logical_and_expr)
| logical_and_expr
;
注意如果
+(...)
而不是*(...)
要求至少一个尾随的逻辑操作
现在,Spirit必须被告知如何将第一个分支分配给
操作数
属性。应该是这里的修补程序,但遗憾的是我遇到了未定义的行为(这花费了最多的时间)。我定了一个更详细的解决方案:_logical_or_expr = logical_and_expr>> +(logical_or_op> logical_and_expr);
logical_or_expr = _logical_or_expr | logical_and_expr;
正如你可能看到的,它是一样的,但第一个分支提取到一个单独的规则,所以我们可以声明规则来公开所需的属性:
qi :: rule< It,ast :: operand ),Skipper> logical_or_expr;
qi :: rule< It,ast :: expression(),Skipper> _logical_or_expr;
确实为每个优先级别的子表达式做这个,结果如下:
_logical_or_expr = logical_and_expr>> +(logical_or_op> logical_and_expr);
_multiplicative_expr = unary_expr>> *(multiplicative_op> unary_expr);
_additive_expr = multiplicative_expr>> +(additive_op> multiplicative_expr);
_relational_expr = additive_expr>> +(relational_op> additive_expr);
_equality_expr = relational_expr>> +(equality_op> relational_expr);
_logical_and_expr = equality_expr>> +(logical_and_op> equality_expr);
logical_or_expr = _logical_or_expr | logical_and_expr;
logical_and_expr = _logical_and_expr | equal_expr;
equality_expr = _equality_expr | relational_expr;
relational_expr = _relational_expr | additive_expr;
additive_expr = _additive_expr |乘法_expr;
multiplicative_expr = _multiplicative_expr | attr_cast< ast :: operand,ast :: operand> (unary_expr);
最终结果在这里: http://coliru.stacked-crooked.com/a/8539757bb02fca34 (可惜这只是太非常适用于Coliru),并打印:
< expr>
...
< / logical_or_expr>
< success>< / success>
< attributes> [[42,[]]]< / attributes>
< / expr>
CAVEAT 请注意,此修改将 不 使解析器更有效率!事实上,它只会导致一系列的回溯(调试输出计数在925行,而不是在步骤1 (!!)中的45)。
现在,将有一些优化使用先行断言和/或语义动作的空间,但一般来说,您将不得不考虑对AST存储的优化将花费CPU时间。
Dear Spirit Qi experts.
I have played around with the MiniC example in Spirit Qi, and have noticed an issue with "empty" AST nodes in the expression grammar. It will generate "expression" nodes that hold nothing but a single operand of type "expression" again.
I think the issue is due to the recursive rule definitions for the operator precedence:
// expression.hpp
qi::rule<Iterator, ast::expression(), skipper<Iterator> >
expr, equality_expr, relational_expr,
logical_or_expr, logical_and_expr,
additive_expr, multiplicative_expr
;
// expression_def.hpp
expr =
logical_or_expr.alias()
;
logical_or_expr =
logical_and_expr
>> *(logical_or_op > logical_and_expr)
;
logical_and_expr =
equality_expr
>> *(logical_and_op > equality_expr)
;
// ast.hpp
typedef boost::variant<
nil
, bool
, unsigned int
, identifier
, boost::recursive_wrapper<unary>
, boost::recursive_wrapper<function_call>
, boost::recursive_wrapper<expression>
>
operand;
/*...*/
struct expression
{
operand first;
std::list<operation> rest;
};
When logical_or_expr recurses into logical_and_expr, logical_and_expr will return an expression(). Due to the attribute propagation is Spirit, logical_or_expr then assigns this expression to it's expression's "first" member.
I think this is what generates all these "empty" expression nodes. I find that, however, nasty, and would like to get rid of them, but have not yet been successful. Has anyone found a decent solution to this before?
I'm thinking it would be somehow possible if an expression consisted of binary ops and unary ops. This would then also have the advantage of keeping the operation and operands in the same structure (pseudo-code):
struct binary_op
{
optype type;
operand op1;
operand op2;
}
struct unary_op
{
optype type;
operand op1;
}
struct eval_op
{
operand op1;
}
typedef boost::variant<binary_op,
unary_op,
eval_op>
expression;
typedef boost::variant<int,
bool,
boost::recursive_wrapper<expression> >
operand;
However, I believe I would still encounter this "empty node" issue after playing this out on paper. It seems like I'm chasing my own tail.
Does anyone have an idea on how to approach this issue?
EDIT
a > b
Will parse to:
expression (from logical_or_op) // Empty expression (forward only)
(rest) -/ \- (first) expression (from logical_and_op) // Empty expression (forward only)
(rest) -/ \- (first) expression (from equality_op) // Empty expression (forward only)
(rest) -/ \- (first) expression (from relational_op) // "Correct" expression
(first) a -/ \- (rest)
[0] operator_: op_greater
operand_: b
The desired output would be:
expression (from relational_op)
(first) a -/ \- (rest)
[0] operator_: op_greater
operand_: b
EDIT2
I have uploaded a modified mini-c version that prints the generated expression tree for an expression:
If you look at the included A.avxh file, it contains an expression to parse. Set a breakpoint in main.cpp at line 67, and observe how often it recursively steps in there.
This is because all the rules expose a "raw" ast::expression
and this is a fixed type.
It has clearly been a choice for simplicity in this sample: the benefits are
- By making all expression nodes the same type, the tree visitation code can be simple and uniform.
- All rules have the same 'signature' and follow the same pattern. It's easy to reason about.
- In this particular example (mini_c) the code-gen phase benefits by inheriting the same simplicity.
The usual way to have more flexible AST that follows the semantics more closely would be by making expression
a variant instead: that way each expression can directly contain the actual "concrete" subexpression type instead of "wading" through intermediate levels of node types, just to mimic the grammar structure instead of the semantics.
I think I have several examples of such asts and corresponding grammars on this site, will see if I can link one later.
In fact, I think the
typedef variant<...> statement
in the same example (ast.hpp
) is not a bad example of this approach.Relevant links:
- Boost::Spirit Expression Parser
Boost::spirit how to parse and call c++ function-like expressions this being a question which I answered by providing a fully featured expression parser (with 'builtin function' evaluation), in two ways:
- A 'bells and whistles' approach with a full AST (resembling what we're doing here with the mini_c sample expressions)
- A 'pragmatic' approach which evaluates on-the-fly using just semantic actions - note this is optimal for storage, but has some downsides, which I address in that post
For now, if you don't wish to alter the grammar (so as not to "lose" the simplicity) you could instead do a transformation on the AST (a "simplify" pass on the expressions, so to speak).
I'm going to see what I can come up with in the next hour.
I've just refactored the grammar so that it doesn't result in such deep nesting.
First, let's reduce the test to a standalone test bed that just parses expressions and shows how a simple expression (
"42"
) parses to a deeply nested AST: http://coliru.stacked-crooked.com/a/5467ca41b0ac1d03<expr> ... <success></success> <attributes>[[[[[[[42, []], []], []], []], []], []]]</attributes> </expr>
Next, let's remove the root problem: the grammar returns an invariant type (
ast::expression
) which is too heavy in many cases. Instead, we'd like to return anast::operand
(which is variant, and can contain the sameast::expression
node): http://coliru.stacked-crooked.com/a/00e43b1f61db018cLastly we'd like all rules to become variant as well and returning either an
expression
iff there are trailing operations, or just a loneoperand
in the other case, in pseudo code:logical_or_expr = (logical_and_expr >> +(logical_or_op > logical_and_expr) | logical_and_expr ;
Note the subtle use if
+(...)
instead of*(...)
to mandate at least one trailing logical_or operationNow, Spirit will have to be told how to assign the first branch to an
operand
attribute.qi::attr_cast<ast::expression>(...)
should have been the fix here, but sadly I ran into undefined behaviour (this took the most time). I settled for a more verbose solution:_logical_or_expr = logical_and_expr >> +(logical_or_op > logical_and_expr) ; logical_or_expr = _logical_or_expr | logical_and_expr;
As you can probably see, it's just the same, but with the first branch extracted to a separate rule, so we can just declare the rules to expose the desired attributes:
qi::rule<It, ast::operand(), Skipper> logical_or_expr; qi::rule<It, ast::expression(), Skipper> _logical_or_expr;
Indeed doing this for each precedence level of subexpressions, results in the following:
_logical_or_expr = logical_and_expr >> +(logical_or_op > logical_and_expr) ; _multiplicative_expr = unary_expr >> *(multiplicative_op > unary_expr) ; _additive_expr = multiplicative_expr >> +(additive_op > multiplicative_expr) ; _relational_expr = additive_expr >> +(relational_op > additive_expr) ; _equality_expr = relational_expr >> +(equality_op > relational_expr) ; _logical_and_expr = equality_expr >> +(logical_and_op > equality_expr) ; logical_or_expr = _logical_or_expr | logical_and_expr ; logical_and_expr = _logical_and_expr | equality_expr ; equality_expr = _equality_expr | relational_expr ; relational_expr = _relational_expr | additive_expr ; additive_expr = _additive_expr | multiplicative_expr ; multiplicative_expr = _multiplicative_expr | attr_cast<ast::operand, ast::operand> (unary_expr) ;
The end result is here: http://coliru.stacked-crooked.com/a/8539757bb02fca34 (sadly it is just too much for Coliru), and it prints:
<expr>
...
</logical_or_expr>
<success></success>
<attributes>[[42, []]]</attributes>
</expr>
CAVEAT Note that this adaptation will NOT make the parser more efficient! In fact, it will just result in a boatload of backtracking (the debug output counts 925 lines instead of just 45 in Step 1 (!!)).
Now, there will be some room to optimize using look-ahead assertions and/or semantic actions, but in general you will have to consider that optimizing for AST storage is going to cost CPU time.
这篇关于语法树空节点问题与Spirit Qi MiniC的例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!