Boost Spirit 2-符号扩展和回溯 [英] Boost Spirit 2 - Symbol expansion and backtracking
问题描述
我在指定和解析一个相当简单的语法时遇到了一些问题.
I am having some issues specifying and parsing a rather simple grammar.
vertex = char+
edge = vertex " -> " vertex
start = ((vertex | edge) eol)*
input = "a\nb\na -> b\n"
Spirit正在执行以下操作:
Spirit is doing the following:
"a" -> vertex
"\n" -> eol
-> start
"b" -> vertex
"\n" -> eol
-> start
和
"a" -> vertex
terminate
而不是最终确定边缘并分析整个输入. 也就是说,它可以解析整个输入,但不是. 它不应该回溯并尝试使用替代规则进行解析吗?从而完成启动规则.
instead of identifying the edge in the end and parsing the entire input. That is, it could parse the entire input but it's not. Shouldn't it backtrack and attempt to parse using the alternate rule? And thus complete the start rule.
Is it because Spirit uses PEGs? (http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/abstracts/parsing_expression_grammar.html#spirit.abstracts.parsing_expression_grammar.alternatives)
最小工作示例:
#include <cstdio>
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
void print(const std::string & str) {
printf("%s\n", str.c_str());
}
void print_eol() {
printf("eol\n");
}
int main() {
std::string str = "a\nb\na -> b\n";
std::string::iterator it = str.begin(), begin = str.begin(), end = str.end();
qi::rule<std::string::iterator, std::string()> vertex = +qi::alnum;
qi::rule<std::string::iterator, std::string()> edge = vertex >> " -> " >> vertex;
qi::rule<std::string::iterator> start = (vertex[&print] | edge[&print]) % qi::eol[&print_eol];
bool matched = parse(it, end, start);
if (matched) {
printf("matched\n");
}
if (it == end) {
printf("all\n");
} else if (it != begin) {
printf("some\n");
} else {
printf("none\n");
}
return 0;
}
输出:
$ ./a.out
a
eol
b
eol
a
matched
some
我在MSYS2上使用Boost 1.57.0和Clang 3.5.1.
I'm using Boost 1.57.0 and Clang 3.5.1 on MSYS2.
推荐答案
似乎您回答了自己的问题:是的,因为PEG语法本质上是贪婪的,从左到右的匹配.
It seems that you answered your own question: yes, it's because PEG grammars are inherently greedy, left-to-right matching.
注释
-
通过语义动作打印不是公平测试,因为即使解析器进行回溯(如果解析器失败,它也可以继续下一个替代分支)单一表达式)的副作用已经被触发.这是使用语义动作的主要缺点-除非您小心谨慎( Boost Spirit: 语义行为是邪恶的?"?)
printing from a semantic action is not a fair test, because even when the parser does backtrack (it can continue the next alternative branch on failure of a single expression) the side effect will already have fired. This is a major draw back of using Semantic Actions - unless you're careful (Boost Spirit: "Semantic actions are evil"?)
BOOST_SPIRIT_DEBUG和BOOST_SPIRIT_DEBUG_NODES宏可用于此目的并提供更多信息
BOOST_SPIRIT_DEBUG and BOOST_SPIRIT_DEBUG_NODES macros are there fopr this purpose and give a lot more information
显而易见的解决方案是
-
对分支进行重新排序:
reorder the branches:
start = (edge[&print] | vertex[&print]) % qi::eol[&print_eol];
Live On Coliru
a
eol
b
eol
a -> b
matched all
左-因式分解:
left - factorisation:
start = (vertex[print] >> -(" -> " >> vertex)[print]) % qi::eol[&print_eol];
Live On Coliru
a
eol
b
eol
a
b
matched all
如果除了满足您对PEG语法的好奇心之外,还需要一些现实中的想法,则可以在SO答案中搜索想法,以了解如何将其解析为大约十二个
If you want some real-life ideas beyond satisfying your curiosity about PEG grammars, you can search SO answers for ideas how to parse into separate collections of vertices/edges in roughly a dozen boost-spirit answers on SO, as I remember.
这篇关于Boost Spirit 2-符号扩展和回溯的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!