Boost Spirit 2-符号扩展和回溯 [英] Boost Spirit 2 - Symbol expansion and backtracking

查看:78
本文介绍了Boost Spirit 2-符号扩展和回溯的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在指定和解析一个相当简单的语法时遇到了一些问题.

I am having some issues specifying and parsing a rather simple grammar.

vertex = char+
edge = vertex " -> " vertex
start = ((vertex | edge) eol)*

input = "a\nb\na -> b\n"

Spirit正在执行以下操作:

Spirit is doing the following:

"a" -> vertex
"\n" -> eol
-> start

"b" -> vertex
"\n" -> eol
-> start

"a" -> vertex
terminate

而不是最终确定边缘并分析整个输入. 也就是说,它可以解析整个输入,但不是. 它不应该回溯并尝试使用替代规则进行解析吗?从而完成启动规则.

instead of identifying the edge in the end and parsing the entire input. That is, it could parse the entire input but it's not. Shouldn't it backtrack and attempt to parse using the alternate rule? And thus complete the start rule.

是因为Spirit使用PEG吗? (

Is it because Spirit uses PEGs? (http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/abstracts/parsing_expression_grammar.html#spirit.abstracts.parsing_expression_grammar.alternatives)

最小工作示例:

#include <cstdio>
#include <string>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

void print(const std::string & str) {
  printf("%s\n", str.c_str());
}

void print_eol() {
  printf("eol\n");
}

int main() {
  std::string str = "a\nb\na -> b\n";
  std::string::iterator it = str.begin(), begin = str.begin(), end = str.end();

  qi::rule<std::string::iterator, std::string()> vertex = +qi::alnum;
  qi::rule<std::string::iterator, std::string()> edge = vertex >> " -> " >> vertex;
  qi::rule<std::string::iterator> start = (vertex[&print] | edge[&print]) % qi::eol[&print_eol];

  bool matched = parse(it, end, start);

  if (matched) {
    printf("matched\n");
  }
  if (it == end) {
    printf("all\n");
  } else if (it != begin) {
    printf("some\n");
  } else {
    printf("none\n");
  }

  return 0;
}

输出:

$ ./a.out
a
eol
b
eol
a
matched
some

我在MSYS2上使用Boost 1.57.0和Clang 3.5.1.

I'm using Boost 1.57.0 and Clang 3.5.1 on MSYS2.

推荐答案

似乎您回答了自己的问题:是的,因为PEG语法本质上是贪婪的,从左到右的匹配.

It seems that you answered your own question: yes, it's because PEG grammars are inherently greedy, left-to-right matching.

注释

  • printing from a semantic action is not a fair test, because even when the parser does backtrack (it can continue the next alternative branch on failure of a single expression) the side effect will already have fired. This is a major draw back of using Semantic Actions - unless you're careful (Boost Spirit: "Semantic actions are evil"?)

BOOST_SPIRIT_DEBUG和BOOST_SPIRIT_DEBUG_NODES宏可用于此目的并提供更多信息

BOOST_SPIRIT_DEBUG and BOOST_SPIRIT_DEBUG_NODES macros are there fopr this purpose and give a lot more information

显而易见的解决方案是

  1. 对分支进行重新排序:

  1. reorder the branches:

start = (edge[&print] | vertex[&print]) % qi::eol[&print_eol];

在Coliru上直播

Live On Coliru

a
eol
b
eol
a -> b
matched all

  • 左-因式分解:

  • left - factorisation:

    start = (vertex[print] >> -(" -> " >> vertex)[print]) % qi::eol[&print_eol];
    

    在Coliru直播

    Live On Coliru

    a
    eol
    b
    eol
    a
    b
    matched all
    

  • 如果除了满足您对PEG语法的好奇心之外,还需要一些现实中的想法,则可以在SO答案中搜索想法,以了解如何将其解析为大约十二个

    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆