解析复合语法时出现内部Boost :: Spirit代码段错误 [英] Internal Boost::Spirit code segfaults when parsing a composite grammar

查看：69 发布时间：2020/5/25 0:38:26 c++ parsing boost boost-spirit

本文介绍了解析复合语法时出现内部Boost :: Spirit代码段错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Spirit解析形式为Module1.Module2.value的表达式(任意数量的点分隔大写标识符，然后是点，然后是小写的OCaml样式标识符).我目前对解析器的定义如下:

I'm trying to use Spirit to parse expressions of the form Module1.Module2.value (any number of dot-separated capitalized identifiers, then a dot, then a lowercase OCaml-style identifier). My current definition of the parser looks like this:

using namespace boost::spirit::qi;

template <typename Iter=std::string::iterator>
struct value_path : grammar<Iter, boost::tuple<std::vector<std::string>, std::string>()> {
    value_path() :
        value_path::base_type(start)
    {
        start = -(module_path<Iter>() >> '.') >> value_name<Iter>();
    }
    rule<Iter, boost::tuple<std::vector<std::string>, std::string>()> start;
};

其中，module_path和value_name是从qi::grammar继承的类似模板结构，具有分配了一些Spirit规则的单个start字段，可能使用了其他自定义语法(例如，value_name取决于lowercase_ident和operator_name的定义类似).

where module_path and value_name are similar template structs inherting from qi::grammar with single start field that is assigned some Spirit rule, possibly using other custom grammars (e.g. value_name depends on lowercase_ident and operator_name which are defined analogously) in the constructor.

尝试使用此语法进行parse_phrase()时，程序将在Spirit内部(根据gdb)内部的某个地方隔离.等效的定义，其中value_path的构造函数如下(我基本上已经展开了它依赖的所有自定义语法，只保留了内置的Spirit解析器，并试图使其可读，这在后来看来是傻瓜的事): /p>

When attempting to parse_phrase() with this grammar, the program segfaults somewhere in the internals of Spirit (according to gdb). The equivalent definition, where the constructor of value_path is as follows (I've basically unrolled all custom grammars it depends on, leaving only builtin Spirit parsers, and attempted to make it readable, which in hindsight was a fool's errand):

start =
-((raw[upper >> *(alnum | char_('_') | char_('\''))] % '.') >> '.')
>> lexeme[((lower | char_('_')) >> *(alnum | char_('_') | char_('\'')))
         | char_('(') >>
             ( ( (char_('!') >> *char_("-+!$%&*./:<=>?@^|~")
                 | (char_("~?") >> +char_("-+!$%&*./:<=>?@^|~"))
                 | ( (char_("-+=<>@^|&*/$%") >> *char_("-+!$%&*./:<=>?@^|~"))
                   | string("mod")
                   | string("lor")
                   | string("lsl")
                   | string("lsr")
                   | string("asr")
                   | string("or")
                   | string("-.")
                   | string("!=")
                   | string("||")
                   | string("&&")
                   | string(":=")
                   | char_("*+=<>&-")
                   )
                 ) >> char_(')')
               )
             )
         ];

不会出现段错误，并且似乎可以正常工作，但是我宁愿避免在我的代码中出现这种冗长和不可读的内容.它也根本不可扩展.

does not segfault, and appears to work correctly, however I would rather avoid something this verbose and unreadable in my code. It's also not extensible at all.

到目前为止，我已经尝试了.alias()的各种组合，以及将value_name<Iter>()，module_path<Iter>()和沿依赖链的所有中间语法保存到了自己的字段中.这些都不起作用.如何保持第一个示例的高抽象水平?有没有一种标准的方式可以在Spirit中编写语法而不会遇到问题?

So far I've tried various combinations of .alias(), as well as saving value_name<Iter>(), module_path<Iter>() and all intermediate grammars along the dependency chain into their own fields. Neither of those worked. How can I keep the high level of abstraction of the first example? Is there a standard way of composing grammars in Spirit that does not run into issues?

推荐答案

您会遇到麻烦，因为表达式模板保留了对临时对象的内部引用.

You're running into trouble because expression templates keep internal references to temporaries.

仅聚合子解析器实例:

template <typename Iter=std::string::iterator>
struct value_path : grammar<Iter, boost::tuple<std::vector<std::string>, std::string>()> {
    value_path() : value_path::base_type(start)
    {
        start = -(module_path_ >> '.') >> value_name_;
    }
  private:

    rule<Iter, boost::tuple<std::vector<std::string>, std::string>()> start;
    module_path<Iter> module_path_;
    value_name<Iter> value_name_;
};

请注意，我觉得对这样的小项目使用单独的子语法可能是一种设计气味.尽管语法分解通常是使构建时间易于管理并使代码大小更小的一个好主意，但从这里的描述看来，您可能做得过多.

Notes I feel it might be a design smell to use separate sub-grammars for such small items. Although grammar decomposition is frequently a good idea to keep build times manageable and code size somewhat lower, but it seems - from the description here - you might be overdoing things.

qi::rule(有效的类型擦除)后面的解析器表达式的整形"可能会带来大量的运行时开销.如果随后针对多个迭代器类型实例化了这些实例，则可能会将其与不必要的二进制增长混合在一起.

The "plastering" of parser expressions behind a qi::rule (effectively type erasure) comes with a possibly significant runtime overhead. If you subsequently instantiate those for more than a single iterator type, you may be compounding this with unnecessary growth of the binary.

更新关于在Spirit中编写语法的惯用方式，这是我的看法:

UPDATE Regarding the idiomatic way to compose your grammars in Spirit, here's my take:

在Coliru上直播

using namespace ascii;
using qi::raw;

lowercase_ident  = raw[ (lower | '_') >> *(alnum | '_' | '\'') ];
module_path_item = raw[ upper >> *(alnum | '_' | '\'') ];
module_path_     = module_path_item % '.';

auto special_char = boost::proto::deep_copy(char_("-+!$%&*./:<=>?@^|~"));

operator_name = qi::raw [
          ('!' >> *special_char)                          /* branch 1     */
        | (char_("~?") >> +special_char)                  /* branch 2     */
        | (!char_(".:") >> special_char >> *special_char) /* branch 3     */
        | "mod"                                           /* branch 4     */
        | "lor" | "lsl" | "lsr" | "asr" | "or"            /* branch 5-9   */
        | "-."                                            /* branch 10    doesn't match because of branch 3   */
        | "!=" | "||" | "&&" | ":="                       /* branch 11-14 doesn't match because of branch 1,3 */
     // | (special_char - char_("!$%./:?@^|~"))           /* "*+=<>&-" cannot match because of branch 3 */
    ]
    ;

value_name_  = 
      lowercase_ident
    | '(' >> operator_name >> ')'
    ;

start = -(module_path_ >> '.') >> value_name_;

其中规则是声明为:

qi::rule<Iter, ast::value_path(),  Skipper> start;
qi::rule<Iter, ast::module_path(), Skipper> module_path_;

// lexeme: (no skipper)
qi::rule<Iter, std::string()> value_name_, module_path_item, lowercase_ident, operator_name;

注意:

我添加了一个船长，因为由于您的value_path语法没有使用过，所以您传递给qi::phrase_parse的任何船长都将被忽略
lexemes只是从规则声明类型中删除了船长，因此您甚至无需指定qi::lexeme[]
在词素中，我复制了您打算使用qi::raw逐字复制已解析的文本的意图.这使我们可以更简洁地编写语法(使用'!'代替char_('!')，使用"mod"代替qi::string("mod")).请注意，在Qi解析器表达式的上下文中，裸字面量被隐式转换为非捕获" qi::lit(...)节点，但是由于无论如何我们都使用了raw[]，因此lit不能捕获属性的事实并非问题.

I've added a skipper, because since your value_path grammar didn't use one, any skipper you passed into qi::phrase_parse was being ignored
The lexemes just drop the skipper from the rule declaration type, so you don't even need to specify qi::lexeme[]
In the lexemes, I copied your intention to just copy the parsed text verbatim using qi::raw. This allows us to write grammars more succinctly (using '!' instead of char_('!'), "mod" instead of qi::string("mod")). Note that bare literals are implicitly transformed into "non-capturing" qi::lit(...) nodes in the context of a Qi parser expression, but since we used raw[] anyways, the fact that lit doesn't capture an attribute is not a problem.

我认为这会产生完美的 cromulent 语法定义，该定义应满足您的高级"标准.语法本身有一些wtf-y-ness(可能不考虑任何解析器生成器语言的表达):

I think this results in a perfectly cromulent grammar definition that should satisfy your criteria for "high-level". There's some wtf-y-ness with the grammar itself (regardless of its expression any parser generator language, likely):

我通过删除替代分支的嵌套简化了operator_name规则，该嵌套将产生与简化的平面替代列表相同的效果
我已经将特殊字符的魔术"列表重构为special_chars
在替代的分支3 中，例如，我注意到带有否定断言的异常:

I've simplified the operator_name rule by removing nesting of alternative branches that will result in the same effect as the simplified flat alternative list
I've refactored the "magic" lists of special characters into special_chars
In alternative branch 3, e.g., I've noted the exceptions with a negative assertion:

(!char_(".:") >> special_char >> *special_char) /* branch 3     */

!char_(".:")断言说:当输入不匹配'.'或':'时，继续匹配(任何特殊字符序列).实际上，您可以将其等效地写为:

The !char_(".:") assertion says: when the input wouldn't match '.' or ':' continue matching (any sequence of special characters). In fact you could equivalently write this as:

((special_char - '.' - ':') >> *special_char) /* branch 3     */

或者甚至是我最终写的时候:

or even, as I ended up writing it:

(!char_(".:") >> +special_char) /* branch 3     */

简化分支实际上提高了抽象水平！现在很清楚，某些分支将永远不会匹配，因为较早的分支按定义匹配输入:

The simplification of the branches actually raises the level of abstraction! It becomes clear now, that some of the branches will never match, because earlier branches match the input by definition:

   | "-."                                    /* branch 10    doesn't match because of branch 3   */
   | "!=" | "||" | "&&" | ":="               /* branch 11-14 doesn't match because of branch 1,3 */
// | (special_char - char_("!$%./:?@^|~"))   /* "*+=<>&-" cannot match because of branch 3 */

我希望您能明白为什么我将语法的这一部分限定为一点wtf-y" :)我现在暂时假设，当您将其简化为一条规则时，您会感到困惑或出了点问题(您的傻子差事".

I hope you can see why I qualify this part of the grammar as "a little bit wtf-y" :) I'll assume for now that you got confused or something went wrong when you reduces it to a single rules (your "fool's errand").

一些需要改进的地方:

我添加了正确的AST结构而不是boost::tuple<>，以使代码更清晰
我添加了BOOST_SPIRIT_DEBUG *宏，以便您可以在较高的级别(规则级别)调试语法
我已经放弃了毯子using namespace.这通常是一个坏主意.而且对于Spirit来说，这通常是一个非常糟糕的主意(它可能导致无法解决的歧义，或者很难发现错误).如您所见，它不一定会导致非常冗长的代码.

I've added a proper AST struct instead of the boost::tuple<> to make the code more legible
I've added BOOST_SPIRIT_DEBUG* macros so you can debug your grammar at a high level (the rule level)
I've ditched the blanket using namespace. This is generally a bad idea. And with Spirit it is frequently a very bad idea (it can lead to ambiguities that are unsolvable, or to very hard to spot errors). As you can see, it doesn't necessarily lead to very verbose code.

#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>

namespace qi    = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

namespace ast {
    using module_path = std::vector<std::string>;
    struct value_path {
        module_path module;
        std::string   value_expr;
    };
}

BOOST_FUSION_ADAPT_STRUCT(ast::value_path, (ast::module_path, module)(std::string,value_expr))

template <typename Iter, typename Skipper = ascii::space_type>
struct value_path : qi::grammar<Iter, ast::value_path(), Skipper> {
    value_path() : value_path::base_type(start)
    {
        using namespace ascii;
        using qi::raw;

        lowercase_ident  = raw[ (lower | '_') >> *(alnum | '_' | '\'') ];
        module_path_item = raw[ upper >> *(alnum | '_' | '\'') ];
        module_path_     = module_path_item % '.';

        auto special_char = boost::proto::deep_copy(char_("-+!$%&*./:<=>?@^|~"));

        operator_name = qi::raw [
                  ('!'          >> *special_char)         /* branch 1     */
                | (char_("~?")  >> +special_char)         /* branch 2     */
                | (!char_(".:") >> +special_char)         /* branch 3     */
                | "mod"                                   /* branch 4     */
                | "lor" | "lsl" | "lsr" | "asr" | "or"    /* branch 5-9   */
                | "-."                                    /* branch 10    doesn't match because of branch 3   */
                | "!=" | "||" | "&&" | ":="               /* branch 11-14 doesn't match because of branch 1,3 */
             // | (special_char - char_("!$%./:?@^|~"))   /* "*+=<>&-" cannot match because of branch 3 */
            ]
            ;

        value_name_  = 
              lowercase_ident
            | '(' >> operator_name >> ')'
            ;

        start = -(module_path_ >> '.') >> value_name_;

        BOOST_SPIRIT_DEBUG_NODES((start)(module_path_)(value_name_)(module_path_item)(lowercase_ident)(operator_name))
    }
  private:
    qi::rule<Iter, ast::value_path(),  Skipper> start;
    qi::rule<Iter, ast::module_path(), Skipper> module_path_;

    // lexeme: (no skipper)
    qi::rule<Iter, std::string()> value_name_, module_path_item, lowercase_ident, operator_name;
};

int main()
{
    for (std::string const input : { 
            "Some.Module.Package.ident",
            "ident",
            "A.B.C_.mod",    // as lowercase_ident
            "A.B.C_.(mod)",  // as operator_name (branch 4)
            "A.B.C_.(!=)",   // as operator_name (branch 1)
            "(!)"            // as operator_name (branch 1)
            })
    {
        std::cout << "--------------------------------------------------------------\n";
        std::cout << "Parsing '" << input << "'\n";

        using It = std::string::const_iterator;
        It f(input.begin()), l(input.end());

        value_path<It> g;
        ast::value_path data;
        bool ok = qi::phrase_parse(f, l, g, ascii::space, data);
        if (ok) {
            std::cout << "Parse succeeded\n";
        } else {
            std::cout << "Parse failed\n";
        }

        if (f!=l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
}

调试输出

--------------------------------------------------------------
Parsing 'Some.Module.Package.ident'
<start>
  <try>Some.Module.Package.</try>
  <module_path_>
    <try>Some.Module.Package.</try>
    <module_path_item>
      <try>Some.Module.Package.</try>
      <success>.Module.Package.iden</success>
      <attributes>[[S, o, m, e]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>Module.Package.ident</try>
      <success>.Package.ident</success>
      <attributes>[[M, o, d, u, l, e]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>Package.ident</try>
      <success>.ident</success>
      <attributes>[[P, a, c, k, a, g, e]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>ident</try>
      <fail/>
    </module_path_item>
    <success>.ident</success>
    <attributes>[[[S, o, m, e], [M, o, d, u, l, e], [P, a, c, k, a, g, e]]]</attributes>
  </module_path_>
  <value_name_>
    <try>ident</try>
    <lowercase_ident>
      <try>ident</try>
      <success></success>
      <attributes>[[i, d, e, n, t]]</attributes>
    </lowercase_ident>
    <success></success>
    <attributes>[[i, d, e, n, t]]</attributes>
  </value_name_>
  <success></success>
  <attributes>[[[[S, o, m, e], [M, o, d, u, l, e], [P, a, c, k, a, g, e]], [i, d, e, n, t]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'ident'
<start>
  <try>ident</try>
  <module_path_>
    <try>ident</try>
    <module_path_item>
      <try>ident</try>
      <fail/>
    </module_path_item>
    <fail/>
  </module_path_>
  <value_name_>
    <try>ident</try>
    <lowercase_ident>
      <try>ident</try>
      <success></success>
      <attributes>[[i, d, e, n, t]]</attributes>
    </lowercase_ident>
    <success></success>
    <attributes>[[i, d, e, n, t]]</attributes>
  </value_name_>
  <success></success>
  <attributes>[[[], [i, d, e, n, t]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'A.B.C_.mod'
<start>
  <try>A.B.C_.mod</try>
  <module_path_>
    <try>A.B.C_.mod</try>
    <module_path_item>
      <try>A.B.C_.mod</try>
      <success>.B.C_.mod</success>
      <attributes>[[A]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>B.C_.mod</try>
      <success>.C_.mod</success>
      <attributes>[[B]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>C_.mod</try>
      <success>.mod</success>
      <attributes>[[C, _]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>mod</try>
      <fail/>
    </module_path_item>
    <success>.mod</success>
    <attributes>[[[A], [B], [C, _]]]</attributes>
  </module_path_>
  <value_name_>
    <try>mod</try>
    <lowercase_ident>
      <try>mod</try>
      <success></success>
      <attributes>[[m, o, d]]</attributes>
    </lowercase_ident>
    <success></success>
    <attributes>[[m, o, d]]</attributes>
  </value_name_>
  <success></success>
  <attributes>[[[[A], [B], [C, _]], [m, o, d]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'A.B.C_.(mod)'
<start>
  <try>A.B.C_.(mod)</try>
  <module_path_>
    <try>A.B.C_.(mod)</try>
    <module_path_item>
      <try>A.B.C_.(mod)</try>
      <success>.B.C_.(mod)</success>
      <attributes>[[A]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>B.C_.(mod)</try>
      <success>.C_.(mod)</success>
      <attributes>[[B]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>C_.(mod)</try>
      <success>.(mod)</success>
      <attributes>[[C, _]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>(mod)</try>
      <fail/>
    </module_path_item>
    <success>.(mod)</success>
    <attributes>[[[A], [B], [C, _]]]</attributes>
  </module_path_>
  <value_name_>
    <try>(mod)</try>
    <lowercase_ident>
      <try>(mod)</try>
      <fail/>
    </lowercase_ident>
    <operator_name>
      <try>mod)</try>
      <success>)</success>
      <attributes>[[m, o, d]]</attributes>
    </operator_name>
    <success></success>
    <attributes>[[m, o, d]]</attributes>
  </value_name_>
  <success></success>
  <attributes>[[[[A], [B], [C, _]], [m, o, d]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'A.B.C_.(!=)'
<start>
  <try>A.B.C_.(!=)</try>
  <module_path_>
    <try>A.B.C_.(!=)</try>
    <module_path_item>
      <try>A.B.C_.(!=)</try>
      <success>.B.C_.(!=)</success>
      <attributes>[[A]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>B.C_.(!=)</try>
      <success>.C_.(!=)</success>
      <attributes>[[B]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>C_.(!=)</try>
      <success>.(!=)</success>
      <attributes>[[C, _]]</attributes>
    </module_path_item>
    <module_path_item>
      <try>(!=)</try>
      <fail/>
    </module_path_item>
    <success>.(!=)</success>
    <attributes>[[[A], [B], [C, _]]]</attributes>
  </module_path_>
  <value_name_>
    <try>(!=)</try>
    <lowercase_ident>
      <try>(!=)</try>
      <fail/>
    </lowercase_ident>
    <operator_name>
      <try>!=)</try>
      <success>)</success>
      <attributes>[[!, =]]</attributes>
    </operator_name>
    <success></success>
    <attributes>[[!, =]]</attributes>
  </value_name_>
  <success></success>
  <attributes>[[[[A], [B], [C, _]], [!, =]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing '(!)'
<start>
  <try>(!)</try>
  <module_path_>
    <try>(!)</try>
    <module_path_item>
      <try>(!)</try>
      <fail/>
    </module_path_item>
    <fail/>
  </module_path_>
  <value_name_>
    <try>(!)</try>
    <lowercase_ident>
      <try>(!)</try>
      <fail/>
    </lowercase_ident>
    <operator_name>
      <try>!)</try>
      <success>)</success>
      <attributes>[[!]]</attributes>
    </operator_name>
    <success></success>
    <attributes>[[!]]</attributes>
  </value_name_>
  <success></success>
  <attributes>[[[], [!]]]</attributes>
</start>
Parse succeeded

这篇关于解析复合语法时出现内部Boost :: Spirit代码段错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析复合语法时出现内部Boost :: Spirit代码段错误 [英] Internal Boost::Spirit code segfaults when parsing a composite grammar

问题描述

推荐答案

调试输出

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

解析复合语法时出现内部Boost :: Spirit代码段错误 [英] Internal Boost::Spirit code segfaults when parsing a composite grammar

问题描述

推荐答案

调试输出

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭