X3解析规则无法编译 [英] X3 parse rule doesn't compile

查看:73
本文介绍了X3解析规则无法编译的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过编写解析器来解析NAMS使用的十六进制数的两个变体来学习Boost Spirit:

I'm learning Boost Spirit by writing a parser that parses two variants of hex number used by NAMS:

  1. 后缀为0x/0h或前缀为h/x的十六进制数字.
  2. $为前缀的十六进制数字,并且必须后跟十进制数字.
  1. Hex number with either suffix of 0x/0h or prefix of h/x.
  2. Hex number with prefix of $ and must be followed by a decimal digit.

以下是我到目前为止提出的内容,以及 Coliru会议:

Here is what I have come up so far and with Coliru Session:

//#define BOOST_SPIRIT_X3_DEBUG
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <boost/spirit/include/support_extended_variant.hpp>

namespace x3 = boost::spirit::x3;

namespace ast {
    struct hex_data : std::string {};
    struct pascal_hex_data : std::string {};

    struct declared_data : boost::spirit::extended_variant<hex_data, pascal_hex_data>
    {
        declared_data () : base_type ()                              { std::cout << "ctor default\n";               } 
        declared_data (hex_data const& rhs) : base_type (rhs)        { std::cout << "ctor hex: " << rhs << "\n";    } 
        declared_data (pascal_hex_data const& rhs) : base_type (rhs) { std::cout << "ctor pascal: " << rhs << "\n"; } 
    };

} // namespace ast

typedef x3::rule<struct hex_digits_class,     std::string>          hex_digit_type;
typedef x3::rule<struct hex_data_class,       ast::hex_data>        hex_data_type;
typedef x3::rule<struct pascalhex_data_class, ast::pascal_hex_data> pascalhex_data_type;
typedef x3::rule<struct declared_data_class,  ast::declared_data>   declared_data_type;

const hex_data_type       hex_data       = "hex_data";
const hex_digit_type      hex_digit      = "hex_digit";
const pascalhex_data_type pascalhex_data = "pascal_hex_data";
const declared_data_type  declared_data  = "declared_data";

auto const hex_digit_def =
  = x3::skip(x3::char_('_'))
      [
        x3::no_case
        [
          x3::char_ ('0', '9') | x3::char_ ("a", "f")
        ]
      ]
  ;

auto const hex_data_def 
  = x3::no_case[x3::lit ("0h") | "0x"] >> +hex_digit_def
  | +hex_digit_def >> x3::no_case[x3::lit ("h") | "x"]
  ;

auto const pascalhex_data_def 
  = x3::lit ("$") >> x3::char_ ('0', '9') >> +hex_digit_def;

auto const declared_data_def 
  = hex_data_def
  | pascalhex_data_def
  ;

BOOST_SPIRIT_DEFINE (hex_digit, hex_data, pascalhex_data, declared_data)

struct Visitor
{
    using result_type = std::string;
    std::string operator()(ast::hex_data const & v) const        { return "hex_data";        } 
    std::string operator()(ast::pascal_hex_data const & v) const { return "pascal_hex_data"; } 
};

int main()
{
  std::string input = "$9";
  ast::declared_data parsed;

  bool r =
    x3::parse (input.begin (), input.end (),
               declared_data_def,
               parsed);

  std::cout << "r = " << r << "\n";
  Visitor v;
  std::cout << "result = " << boost::apply_visitor(v, parsed) << "\n";
}

但是,规则pascalhex_data_def无法通过错误消息进行编译,看起来像精神正在推导该规则的属性为charvariant的融合元组,即使已指定该规则具有源自string的ast属性:

However, the rule pascalhex_data_def fails to compile with error message that looks like spirit is deducing the attribute of the rule to be a fusion tuple of char and vector of variant even though the rule is specified to have attribute of an ast derived from string:

typedef x3::rule<struct pascalhex_data_class, ast::pascal_hex_data> pascalhex_data_type;

谁能指出为什么没有通过boost推导的属性被指定?是否要强制规则生成字符串而不是tuple boost试图返回?

Can anyone point out why the attribute deduced by boost is not what's specified? Anyway to force the rule to generate string rather than the tuple boost is trying to return?

推荐答案

您的代码看起来非常复杂.但是,在查看了相当长的时间后,我注意到您正在声明规则(强制其属性类型),但在关键时刻却没有使用它们:

Your code seems extremely complicated for what it achieves. However, after looking at it for considerable time, I noticed you are declaring rules (which coerce their attribute types), but not using them at the crucial time:

auto const declared_data_def = hex_data_def | pascalhex_data_def;

这意味着您可以直接从表达式模板(_def)初始化程序构建规则树,而不是规则:

This means you directly build an expression tree from the expression template (_def) initializers, instead of the rules:

auto const declared_data_def = hex_data | pascalhex_data;

可以编译.仍然存在很多问题:

That compiles. It still leaves quite some issues:

  • 您可以/应该在没有变体构造函数的情况下进行操作:

  • you can/should do without the variant constructors:

struct declared_data : boost::spirit::extended_variant<hex_data, pascal_hex_data> {
    using extended_variant::extended_variant;
};

  • 您可以将x3::char_ ('0', '9')编写为x3::char_("0-9"),因此可以编写

  • You can write x3::char_ ('0', '9') as x3::char_("0-9"), so you can write

    x3::no_case
    [
        x3::char_ ('0', '9') | x3::char_ ("a", "f")
    ]
    

    代替

    x3::no_case [ x3::char_ ("0-9a-f") ]
    

    甚至

    x3::char_ ("0-9a-fA-F")
    

    或者,也许只是:

    x3::xdigit
    

  • hex_digits_type声明了std::string属性,但仅解析单个字符.代替使用+hex_digits_def,只需使用hex_digits并写:

  • hex_digits_type declares a std::string attribute, but parses only a a single character. Instead of using +hex_digits_def, just use hex_digits and write:

    auto const hex_digits_def = x3::skip(x3::char_('_')) [ +x3::xdigit ];
    

  • 您的定义

  • your definition

    "$" >> x3::char_("0-9") >> hex_digits
    

    占用十六进制数字的第一位.这会导致错误(例如为$9解析空字符串).相反,您可能想用operator&:

    consumes the first digit of the hex number. That's leading to error (parsing the empty string for e.g. $9). Instead you probably want to check with operator&:

    '$' >> &x3::char_("0-9") >> hex_digits
    

    或者,实际上:

    '$' >> &x3::digit >> hex_digits
    

  • 这些规则实际上都不是递归的,因此它们都不需要声明和定义的任何分隔.这极大地减少了代码

  • none of the rules are actually recursive, so none of them require any separation of declaration and definition. This reduces the code by a huge margin

    我怀疑您想解释为数字,而不是字符串.您可以/应该相应地简化AST.步骤1:删除从1或其他格式解析的事物之间的区别:

    I suspect you want to interpret the hex data as numbers, not string. You could/should probably simplify the AST accordingly. Step 1: drop the distinction between things parsed from 1 or the other format:

    namespace ast {
        using hex_literal = std::string;
    }
    

    现在,整个程序简化为 在Coliru上直播

    Now the whole program simplifies to Live On Coliru

    #include <iostream>
    #include <boost/spirit/home/x3.hpp>
    
    namespace ast {
        using hex_literal = std::string;
    }
    
    namespace parser {
        namespace x3 = boost::spirit::x3;
    
        auto const hex_digits = x3::rule<struct hex_digits_class, ast::hex_literal> {"hex_digits"} 
                              = x3::skip(x3::char_('_')) [ +x3::xdigit ];
    
        auto const hex_qualifier = x3::omit [ x3::char_("hxHX") ];
    
        auto const hex_literal = 
            ('$' >> &x3::xdigit | '0' >> hex_qualifier) >> hex_digits
            | hex_digits >> hex_qualifier;
    }
    
    int main()
    {
        for (std::string const input : { 
                "$9",   "0x1b",   "0h1c",   "1dh",   "1ex",
                "$9_f", "0x1_fb", "0h1_fc", "1_fdh", "1_fex"
        }) {
            ast::hex_literal parsed;
    
            bool r = parse(input.begin(), input.end(), parser::hex_literal, parsed);
            std::cout << "r = " << std::boolalpha << r << ", result = " << parsed << "\n";
        }
    }
    

    打印:

    r = true, result = 9
    r = true, result = 1b
    r = true, result = 1c
    r = true, result = 1d
    r = true, result = 1e
    r = true, result = 9f
    r = true, result = 1fb
    r = true, result = 1fc
    r = true, result = 1fd
    r = true, result = 1fe
    

    第2步(取消下划线分析)

    现在,看来您确实想知道数字值似乎很明显:

    Step 2 (breaking the underscore parsing)

    Now, it seems obvious that really, you want to know the numeric value:

    在Coliru上直播

    #include <iostream>
    #include <boost/spirit/home/x3.hpp>
    
    namespace ast {
        using hex_literal = uintmax_t;
    }
    
    namespace parser {
        namespace x3 = boost::spirit::x3;
    
        auto const hex_qualifier = x3::omit [ x3::char_("hxHX") ];
    
        auto const hex_literal 
            = ('$' >> &x3::xdigit | '0' >> hex_qualifier) >> x3::hex
            | x3::hex >> hex_qualifier
            ;
    }
    
    int main()
    {
        for (std::string const input : { 
                "$9",   "0x1b",   "0h1c",   "1dh",   "1ex",
                "$9_f", "0x1_fb", "0h1_fc", "1_fdh", "1_fex"
        }) {
            ast::hex_literal parsed;
    
            auto f = input.begin(), l = input.end();
            bool r = parse(f, l, parser::hex_literal, parsed) && f==l;
    
            std::cout << std::boolalpha
                 << "r = "            << r
                 << ",\tresult = "    << parsed
                 << ",\tremaining: '" << std::string(f,l) << "'\n";
        }
    }
    

    打印

    r = true,   result = 9, remaining: ''
    r = true,   result = 27,    remaining: ''
    r = true,   result = 28,    remaining: ''
    r = true,   result = 29,    remaining: ''
    r = true,   result = 30,    remaining: ''
    r = false,  result = 9, remaining: '_f'
    r = false,  result = 1, remaining: '_fb'
    r = false,  result = 1, remaining: '_fc'
    r = false,  result = 1, remaining: '1_fdh'
    r = false,  result = 1, remaining: '1_fex'
    

    第3步:使其再次与下划线一起使用

    这是我开始考虑自定义解析器的地方.这是因为它将开始涉及语义动作¹以及多个属性强制,坦率地说,将它们打包起来最方便,因此您可以像其他任何人一样编写命令式C ++ 14:

    Step 3: Make it work with underscores again

    This is where I'd start considering a custom parser. This is because it will start involving a semantic action¹ as well as multiple attribute coercions, and frankly it's most convenient to package them up so you can just write imperative C++14 like anyone else:

    在Coliru上直播

    #include <iostream>
    #include <boost/spirit/home/x3.hpp>
    
    namespace ast {
        using hex_literal = uintmax_t;
    }
    
    namespace parser {
        namespace x3 = boost::spirit::x3;
    
        struct hex_literal_type : x3::parser_base {
            using attribute_type = ast::hex_literal;
    
            template <typename It, typename Ctx, typename RCtx>
            static bool parse(It& f, It l, Ctx& ctx, RCtx&, attribute_type& attr) {
                std::string digits;
    
                skip_over(f, l, ctx); // pre-skip using surrounding skipper
    
                auto constexpr max_digits = std::numeric_limits<attribute_type>::digits / 8;
                auto digits_ = x3::skip(x3::as_parser('_')) [x3::repeat(1, max_digits) [ x3::xdigit ] ];
    
                auto qualifier = x3::omit [ x3::char_("hxHX") ];
                auto forms
                    = ('$' >> &x3::digit | '0' >> qualifier) >> digits_
                    | digits_ >> qualifier
                    ;
    
                if (x3::parse(f, l, forms, digits)) {
                    attr = std::stoull(digits, nullptr, 16);
                    return true;
                }
                return false;
            }
        };
    
        hex_literal_type static const hex_literal;
    }
    
    int main() {
        for (std::string const input : { 
                "$9",   "0x1b",   "0h1c",   "1dh",   "1ex",
                "$9_f", "0x1_fb", "0h1_fc", "1_fdh", "1_fex",
                // edge cases
                "ffffffffH", // fits
                "1ffffffffH", // too big
                "$00_00___01___________0__________0", // fine
                "0x", // fine, same as "0h"
                "$",
                // upper case
                "$9",   "0X1B",   "0H1C",   "1DH",   "1EX",
                "$9_F", "0X1_FB", "0H1_FC", "1_FDH", "1_FEX",
        }) {
            ast::hex_literal parsed = 0;
    
            auto f = input.begin(), l = input.end();
            bool r = parse(f, l, parser::hex_literal, parsed) && f==l;
    
            std::cout << std::boolalpha
                 << "r = "            << r
                 << ",\tresult = "    << parsed
                 << ",\tremaining: '" << std::string(f,l) << "'\n";
        }
    }
    

    注意,我如何添加max_digits以避免语法分析失控(例如,当输入具有10 GB的十六进制数字时).您可能希望通过跳过不重要的0数字来改进此功能.

    Note how I included max_digits to avoid runaway parsing (say when the input has 10 gigabyte of hex digits). You might want improve this by preskipping insignificant 0 digits.

    现在输出为:

    r = true,   result = 9, remaining: ''
    r = true,   result = 27,    remaining: ''
    r = true,   result = 28,    remaining: ''
    r = true,   result = 29,    remaining: ''
    r = true,   result = 30,    remaining: ''
    r = true,   result = 159,   remaining: ''
    r = true,   result = 507,   remaining: ''
    r = true,   result = 508,   remaining: ''
    r = true,   result = 509,   remaining: ''
    r = true,   result = 510,   remaining: ''
    r = true,   result = 4294967295,    remaining: ''
    r = false,  result = 0, remaining: '1ffffffffH'
    r = true,   result = 256,   remaining: ''
    r = true,   result = 0, remaining: ''
    r = false,  result = 0, remaining: '$'
    r = true,   result = 9, remaining: ''
    r = true,   result = 27,    remaining: ''
    r = true,   result = 28,    remaining: ''
    r = true,   result = 29,    remaining: ''
    r = true,   result = 30,    remaining: ''
    r = true,   result = 159,   remaining: ''
    r = true,   result = 507,   remaining: ''
    r = true,   result = 508,   remaining: ''
    r = true,   result = 509,   remaining: ''
    r = true,   result = 510,   remaining: ''
    

    步骤4:锦上添花

    如果您想保留用于往返的输入格式,可以立即将其简单地添加到AST中:

    Step 4: Icing on the cake

    In case you wanted to retain the input format for roundtripping you could trivially add that to the AST now:

    在Coliru上直播

    #include <iostream>
    #include <boost/spirit/home/x3.hpp>
    
    namespace ast {
        struct hex_literal {
            uintmax_t value;
            std::string source;
        };
    }
    
    namespace parser {
        namespace x3 = boost::spirit::x3;
    
        struct hex_literal_type : x3::parser_base {
            using attribute_type = ast::hex_literal;
    
            template <typename It, typename Ctx, typename RCtx>
            static bool parse(It& f, It l, Ctx& ctx, RCtx&, attribute_type& attr) {
                std::string digits;
    
                skip_over(f, l, ctx); // pre-skip using surrounding skipper
                It b = f; // save start
    
                auto constexpr max_digits = std::numeric_limits<decltype(attr.value)>::digits / 8;
                auto digits_ = x3::skip(x3::as_parser('_')) [x3::repeat(1, max_digits) [ x3::xdigit ] ];
    
                auto qualifier = x3::omit [ x3::char_("hxHX") ];
                auto forms
                    = ('$' >> &x3::digit | '0' >> qualifier) >> digits_
                    | digits_ >> qualifier
                    ;
    
                if (x3::parse(f, l, forms, digits)) {
                    attr.value = std::stoull(digits, nullptr, 16);
                    attr.source.assign(b,l);
                    return true;
                }
                return false;
            }
        };
    
        hex_literal_type static const hex_literal;
    }
    
    int main()
    {
        for (std::string const input : { 
                "$9",   "0x1b",   "0h1c",   "1dh",   "1ex",
                "$9_f", "0x1_fb", "0h1_fc", "1_fdh", "1_fex",
                // edge cases
                "ffffffffH", // fits
                "1ffffffffH", // too big
                "$00_00___01___________0__________0", // fine
                "0x", // fine, same as "0h"
                "$",
                // upper case
                "$9",   "0X1B",   "0H1C",   "1DH",   "1EX",
                "$9_F", "0X1_FB", "0H1_FC", "1_FDH", "1_FEX",
        }) {
            ast::hex_literal parsed = {};
    
            auto f = input.begin(), l = input.end();
            bool r = parse(f, l, parser::hex_literal, parsed) && f==l;
    
            if (r) {
                std::cout << "result = " << parsed.value
                          << ",\tsource = '" << parsed.source << "'\n";
            }
            else {
                std::cout << "FAILED"
                          << ",\tremaining: '" << std::string(f,l) << "'\n";
            }
        }
    }
    

    打印:

    result = 9, source = '$9'
    result = 27,    source = '0x1b'
    result = 28,    source = '0h1c'
    result = 29,    source = '1dh'
    result = 30,    source = '1ex'
    result = 159,   source = '$9_f'
    result = 507,   source = '0x1_fb'
    result = 508,   source = '0h1_fc'
    result = 509,   source = '1_fdh'
    result = 510,   source = '1_fex'
    result = 4294967295,    source = 'ffffffffH'
    FAILED, remaining: '1ffffffffH'
    result = 256,   source = '$00_00___01___________0__________0'
    result = 0, source = '0x'
    FAILED, remaining: '$'
    result = 9, source = '$9'
    result = 27,    source = '0X1B'
    result = 28,    source = '0H1C'
    result = 29,    source = '1DH'
    result = 30,    source = '1EX'
    result = 159,   source = '$9_F'
    result = 507,   source = '0X1_FB'
    result = 508,   source = '0H1_FC'
    result = 509,   source = '1_FDH'
    result = 510,   source = '1_FEX'
    


    ¹ Boost Spirit:语义行为是邪恶的"?

    这篇关于X3解析规则无法编译的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆