X3解析规则无法编译 [英] X3 parse rule doesn't compile
问题描述
我正在通过编写解析器来解析NAMS使用的十六进制数的两个变体来学习Boost Spirit:
I'm learning Boost Spirit by writing a parser that parses two variants of hex number used by NAMS:
- 后缀为
0x
/0h
或前缀为h
/x
的十六进制数字. - 以
$
为前缀的十六进制数字,并且必须后跟十进制数字.
- Hex number with either suffix of
0x
/0h
or prefix ofh
/x
. - Hex number with prefix of
$
and must be followed by a decimal digit.
以下是我到目前为止提出的内容,以及 Coliru会议:
Here is what I have come up so far and with Coliru Session:
//#define BOOST_SPIRIT_X3_DEBUG
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <boost/spirit/include/support_extended_variant.hpp>
namespace x3 = boost::spirit::x3;
namespace ast {
struct hex_data : std::string {};
struct pascal_hex_data : std::string {};
struct declared_data : boost::spirit::extended_variant<hex_data, pascal_hex_data>
{
declared_data () : base_type () { std::cout << "ctor default\n"; }
declared_data (hex_data const& rhs) : base_type (rhs) { std::cout << "ctor hex: " << rhs << "\n"; }
declared_data (pascal_hex_data const& rhs) : base_type (rhs) { std::cout << "ctor pascal: " << rhs << "\n"; }
};
} // namespace ast
typedef x3::rule<struct hex_digits_class, std::string> hex_digit_type;
typedef x3::rule<struct hex_data_class, ast::hex_data> hex_data_type;
typedef x3::rule<struct pascalhex_data_class, ast::pascal_hex_data> pascalhex_data_type;
typedef x3::rule<struct declared_data_class, ast::declared_data> declared_data_type;
const hex_data_type hex_data = "hex_data";
const hex_digit_type hex_digit = "hex_digit";
const pascalhex_data_type pascalhex_data = "pascal_hex_data";
const declared_data_type declared_data = "declared_data";
auto const hex_digit_def =
= x3::skip(x3::char_('_'))
[
x3::no_case
[
x3::char_ ('0', '9') | x3::char_ ("a", "f")
]
]
;
auto const hex_data_def
= x3::no_case[x3::lit ("0h") | "0x"] >> +hex_digit_def
| +hex_digit_def >> x3::no_case[x3::lit ("h") | "x"]
;
auto const pascalhex_data_def
= x3::lit ("$") >> x3::char_ ('0', '9') >> +hex_digit_def;
auto const declared_data_def
= hex_data_def
| pascalhex_data_def
;
BOOST_SPIRIT_DEFINE (hex_digit, hex_data, pascalhex_data, declared_data)
struct Visitor
{
using result_type = std::string;
std::string operator()(ast::hex_data const & v) const { return "hex_data"; }
std::string operator()(ast::pascal_hex_data const & v) const { return "pascal_hex_data"; }
};
int main()
{
std::string input = "$9";
ast::declared_data parsed;
bool r =
x3::parse (input.begin (), input.end (),
declared_data_def,
parsed);
std::cout << "r = " << r << "\n";
Visitor v;
std::cout << "result = " << boost::apply_visitor(v, parsed) << "\n";
}
但是,规则pascalhex_data_def
无法通过错误消息进行编译,看起来像精神正在推导该规则的属性为char
和variant
的融合元组,即使已指定该规则具有源自string
的ast属性:
However, the rule pascalhex_data_def
fails to compile with error message that looks like spirit is deducing the attribute of the rule to be a fusion tuple of char
and vector
of variant
even though the rule is specified to have attribute of an ast derived from string
:
typedef x3::rule<struct pascalhex_data_class, ast::pascal_hex_data> pascalhex_data_type;
谁能指出为什么没有通过boost推导的属性被指定?是否要强制规则生成字符串而不是tuple
boost试图返回?
Can anyone point out why the attribute deduced by boost is not what's specified? Anyway to force the rule to generate string rather than the tuple
boost is trying to return?
推荐答案
您的代码看起来非常复杂.但是,在查看了相当长的时间后,我注意到您正在声明规则(强制其属性类型),但在关键时刻却没有使用它们:
Your code seems extremely complicated for what it achieves. However, after looking at it for considerable time, I noticed you are declaring rules (which coerce their attribute types), but not using them at the crucial time:
auto const declared_data_def = hex_data_def | pascalhex_data_def;
这意味着您可以直接从表达式模板(_def
)初始化程序构建规则树,而不是规则:
This means you directly build an expression tree from the expression template (_def
) initializers, instead of the rules:
auto const declared_data_def = hex_data | pascalhex_data;
可以编译.仍然存在很多问题:
That compiles. It still leaves quite some issues:
-
您可以/应该在没有变体构造函数的情况下进行操作:
you can/should do without the variant constructors:
struct declared_data : boost::spirit::extended_variant<hex_data, pascal_hex_data> {
using extended_variant::extended_variant;
};
您可以将x3::char_ ('0', '9')
编写为x3::char_("0-9")
,因此可以编写
You can write x3::char_ ('0', '9')
as x3::char_("0-9")
, so you can write
x3::no_case
[
x3::char_ ('0', '9') | x3::char_ ("a", "f")
]
代替
x3::no_case [ x3::char_ ("0-9a-f") ]
甚至
x3::char_ ("0-9a-fA-F")
或者,也许只是:
x3::xdigit
hex_digits_type
声明了std::string
属性,但仅解析单个字符.代替使用+hex_digits_def
,只需使用hex_digits
并写:
hex_digits_type
declares a std::string
attribute, but parses only a a single character. Instead of using +hex_digits_def
, just use hex_digits
and write:
auto const hex_digits_def = x3::skip(x3::char_('_')) [ +x3::xdigit ];
您的定义
your definition
"$" >> x3::char_("0-9") >> hex_digits
占用十六进制数字的第一位.这会导致错误(例如为$9
解析空字符串).相反,您可能想用operator&
:
consumes the first digit of the hex number. That's leading to error (parsing the empty string for e.g. $9
). Instead you probably want to check with operator&
:
'$' >> &x3::char_("0-9") >> hex_digits
或者,实际上:
'$' >> &x3::digit >> hex_digits
这些规则实际上都不是递归的,因此它们都不需要声明和定义的任何分隔.这极大地减少了代码
none of the rules are actually recursive, so none of them require any separation of declaration and definition. This reduces the code by a huge margin
我怀疑您想将解释为数字,而不是字符串.您可以/应该相应地简化AST.步骤1:删除从1或其他格式解析的事物之间的区别:
I suspect you want to interpret the hex data as numbers, not string. You could/should probably simplify the AST accordingly. Step 1: drop the distinction between things parsed from 1 or the other format:
namespace ast {
using hex_literal = std::string;
}
现在,整个程序简化为 在Coliru上直播
Now the whole program simplifies to Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace ast {
using hex_literal = std::string;
}
namespace parser {
namespace x3 = boost::spirit::x3;
auto const hex_digits = x3::rule<struct hex_digits_class, ast::hex_literal> {"hex_digits"}
= x3::skip(x3::char_('_')) [ +x3::xdigit ];
auto const hex_qualifier = x3::omit [ x3::char_("hxHX") ];
auto const hex_literal =
('$' >> &x3::xdigit | '0' >> hex_qualifier) >> hex_digits
| hex_digits >> hex_qualifier;
}
int main()
{
for (std::string const input : {
"$9", "0x1b", "0h1c", "1dh", "1ex",
"$9_f", "0x1_fb", "0h1_fc", "1_fdh", "1_fex"
}) {
ast::hex_literal parsed;
bool r = parse(input.begin(), input.end(), parser::hex_literal, parsed);
std::cout << "r = " << std::boolalpha << r << ", result = " << parsed << "\n";
}
}
打印:
r = true, result = 9
r = true, result = 1b
r = true, result = 1c
r = true, result = 1d
r = true, result = 1e
r = true, result = 9f
r = true, result = 1fb
r = true, result = 1fc
r = true, result = 1fd
r = true, result = 1fe
第2步(取消下划线分析)
现在,看来您确实想知道数字值似乎很明显:
Step 2 (breaking the underscore parsing)
Now, it seems obvious that really, you want to know the numeric value:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace ast {
using hex_literal = uintmax_t;
}
namespace parser {
namespace x3 = boost::spirit::x3;
auto const hex_qualifier = x3::omit [ x3::char_("hxHX") ];
auto const hex_literal
= ('$' >> &x3::xdigit | '0' >> hex_qualifier) >> x3::hex
| x3::hex >> hex_qualifier
;
}
int main()
{
for (std::string const input : {
"$9", "0x1b", "0h1c", "1dh", "1ex",
"$9_f", "0x1_fb", "0h1_fc", "1_fdh", "1_fex"
}) {
ast::hex_literal parsed;
auto f = input.begin(), l = input.end();
bool r = parse(f, l, parser::hex_literal, parsed) && f==l;
std::cout << std::boolalpha
<< "r = " << r
<< ",\tresult = " << parsed
<< ",\tremaining: '" << std::string(f,l) << "'\n";
}
}
打印
r = true, result = 9, remaining: ''
r = true, result = 27, remaining: ''
r = true, result = 28, remaining: ''
r = true, result = 29, remaining: ''
r = true, result = 30, remaining: ''
r = false, result = 9, remaining: '_f'
r = false, result = 1, remaining: '_fb'
r = false, result = 1, remaining: '_fc'
r = false, result = 1, remaining: '1_fdh'
r = false, result = 1, remaining: '1_fex'
第3步:使其再次与下划线一起使用
这是我开始考虑自定义解析器的地方.这是因为它将开始涉及语义动作¹以及多个属性强制,坦率地说,将它们打包起来最方便,因此您可以像其他任何人一样编写命令式C ++ 14:
Step 3: Make it work with underscores again
This is where I'd start considering a custom parser. This is because it will start involving a semantic action¹ as well as multiple attribute coercions, and frankly it's most convenient to package them up so you can just write imperative C++14 like anyone else:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace ast {
using hex_literal = uintmax_t;
}
namespace parser {
namespace x3 = boost::spirit::x3;
struct hex_literal_type : x3::parser_base {
using attribute_type = ast::hex_literal;
template <typename It, typename Ctx, typename RCtx>
static bool parse(It& f, It l, Ctx& ctx, RCtx&, attribute_type& attr) {
std::string digits;
skip_over(f, l, ctx); // pre-skip using surrounding skipper
auto constexpr max_digits = std::numeric_limits<attribute_type>::digits / 8;
auto digits_ = x3::skip(x3::as_parser('_')) [x3::repeat(1, max_digits) [ x3::xdigit ] ];
auto qualifier = x3::omit [ x3::char_("hxHX") ];
auto forms
= ('$' >> &x3::digit | '0' >> qualifier) >> digits_
| digits_ >> qualifier
;
if (x3::parse(f, l, forms, digits)) {
attr = std::stoull(digits, nullptr, 16);
return true;
}
return false;
}
};
hex_literal_type static const hex_literal;
}
int main() {
for (std::string const input : {
"$9", "0x1b", "0h1c", "1dh", "1ex",
"$9_f", "0x1_fb", "0h1_fc", "1_fdh", "1_fex",
// edge cases
"ffffffffH", // fits
"1ffffffffH", // too big
"$00_00___01___________0__________0", // fine
"0x", // fine, same as "0h"
"$",
// upper case
"$9", "0X1B", "0H1C", "1DH", "1EX",
"$9_F", "0X1_FB", "0H1_FC", "1_FDH", "1_FEX",
}) {
ast::hex_literal parsed = 0;
auto f = input.begin(), l = input.end();
bool r = parse(f, l, parser::hex_literal, parsed) && f==l;
std::cout << std::boolalpha
<< "r = " << r
<< ",\tresult = " << parsed
<< ",\tremaining: '" << std::string(f,l) << "'\n";
}
}
注意,我如何添加
max_digits
以避免语法分析失控(例如,当输入具有10 GB的十六进制数字时).您可能希望通过跳过不重要的0
数字来改进此功能.
Note how I included
max_digits
to avoid runaway parsing (say when the input has 10 gigabyte of hex digits). You might want improve this by preskipping insignificant0
digits.
现在输出为:
r = true, result = 9, remaining: ''
r = true, result = 27, remaining: ''
r = true, result = 28, remaining: ''
r = true, result = 29, remaining: ''
r = true, result = 30, remaining: ''
r = true, result = 159, remaining: ''
r = true, result = 507, remaining: ''
r = true, result = 508, remaining: ''
r = true, result = 509, remaining: ''
r = true, result = 510, remaining: ''
r = true, result = 4294967295, remaining: ''
r = false, result = 0, remaining: '1ffffffffH'
r = true, result = 256, remaining: ''
r = true, result = 0, remaining: ''
r = false, result = 0, remaining: '$'
r = true, result = 9, remaining: ''
r = true, result = 27, remaining: ''
r = true, result = 28, remaining: ''
r = true, result = 29, remaining: ''
r = true, result = 30, remaining: ''
r = true, result = 159, remaining: ''
r = true, result = 507, remaining: ''
r = true, result = 508, remaining: ''
r = true, result = 509, remaining: ''
r = true, result = 510, remaining: ''
步骤4:锦上添花
如果您想保留用于往返的输入格式,可以立即将其简单地添加到AST中:
Step 4: Icing on the cake
In case you wanted to retain the input format for roundtripping you could trivially add that to the AST now:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace ast {
struct hex_literal {
uintmax_t value;
std::string source;
};
}
namespace parser {
namespace x3 = boost::spirit::x3;
struct hex_literal_type : x3::parser_base {
using attribute_type = ast::hex_literal;
template <typename It, typename Ctx, typename RCtx>
static bool parse(It& f, It l, Ctx& ctx, RCtx&, attribute_type& attr) {
std::string digits;
skip_over(f, l, ctx); // pre-skip using surrounding skipper
It b = f; // save start
auto constexpr max_digits = std::numeric_limits<decltype(attr.value)>::digits / 8;
auto digits_ = x3::skip(x3::as_parser('_')) [x3::repeat(1, max_digits) [ x3::xdigit ] ];
auto qualifier = x3::omit [ x3::char_("hxHX") ];
auto forms
= ('$' >> &x3::digit | '0' >> qualifier) >> digits_
| digits_ >> qualifier
;
if (x3::parse(f, l, forms, digits)) {
attr.value = std::stoull(digits, nullptr, 16);
attr.source.assign(b,l);
return true;
}
return false;
}
};
hex_literal_type static const hex_literal;
}
int main()
{
for (std::string const input : {
"$9", "0x1b", "0h1c", "1dh", "1ex",
"$9_f", "0x1_fb", "0h1_fc", "1_fdh", "1_fex",
// edge cases
"ffffffffH", // fits
"1ffffffffH", // too big
"$00_00___01___________0__________0", // fine
"0x", // fine, same as "0h"
"$",
// upper case
"$9", "0X1B", "0H1C", "1DH", "1EX",
"$9_F", "0X1_FB", "0H1_FC", "1_FDH", "1_FEX",
}) {
ast::hex_literal parsed = {};
auto f = input.begin(), l = input.end();
bool r = parse(f, l, parser::hex_literal, parsed) && f==l;
if (r) {
std::cout << "result = " << parsed.value
<< ",\tsource = '" << parsed.source << "'\n";
}
else {
std::cout << "FAILED"
<< ",\tremaining: '" << std::string(f,l) << "'\n";
}
}
}
打印:
result = 9, source = '$9'
result = 27, source = '0x1b'
result = 28, source = '0h1c'
result = 29, source = '1dh'
result = 30, source = '1ex'
result = 159, source = '$9_f'
result = 507, source = '0x1_fb'
result = 508, source = '0h1_fc'
result = 509, source = '1_fdh'
result = 510, source = '1_fex'
result = 4294967295, source = 'ffffffffH'
FAILED, remaining: '1ffffffffH'
result = 256, source = '$00_00___01___________0__________0'
result = 0, source = '0x'
FAILED, remaining: '$'
result = 9, source = '$9'
result = 27, source = '0X1B'
result = 28, source = '0H1C'
result = 29, source = '1DH'
result = 30, source = '1EX'
result = 159, source = '$9_F'
result = 507, source = '0X1_FB'
result = 508, source = '0H1_FC'
result = 509, source = '1_FDH'
result = 510, source = '1_FEX'
这篇关于X3解析规则无法编译的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!