解析使用Boost异构数据::精神 [英] Parsing heterogeneous data using Boost::Spirit
问题描述
我试图弄清楚如何处理以下问题。
I'm trying to figure out how to approach the following problem.
我有以下格式的结构:
struct Data
{
time_t timestamp;
string id;
boost::optional<int> data1;
boost::optional<string> data2;
// etc...
};
这应分析出以下格式的单行字符串:
This should be parsed out of a single line string in the following format:
human_readable_timestamp;id;key1=value1 key2=value2.....
当然键的顺序不必匹配在结构元素的顺序
Of course the ordering of the keys does not have to match the order of elements in the structure.
时的boost ::灵适合这种类型的数据?我该如何处理这?我已经通过了的例子,但我不能管理从例子适合我的要求code就搞定了。
Is Boost::Spirit suitable for this type of data? How do I approach this? I have gone through the examples, but I can't manage to get from the examples to code that fits my requirements.
推荐答案
您可以使用置换解析器。我做了一个非常类似的例子在这里:
You could use the permutation parser. I've made a very similar example here:
- Reading使用C ++和Boost JSON文件
- Reading JSON file with C++ and BOOST
如果您有重复键,那么它更有意义,使用克莱尼*
,或许是
If you have repeating keys, then it makes more sense to use a Kleene*
, perhaps
- 与语义动作分配属性/或/
- 使用属性定制点分配结果
- PS。另外,也要看看从精神库关键字解析器(<一个href=\"http://stackoverflow.com/questions/27812715/boost-qi-composing-rules-using-functions/27816065#27816065\">Boost使用功能齐撰写规则的)
- with semantic actions to assign the attributes /or/
- using attribute customization points to assign the result
- PS. Also look at the keyword parser from Spirit Repository (Boost Qi Composing rules using Functions)
如果你不希望使用语义动作(升压精神:&QUOT;语义行为是邪恶的&QUOT;?),你可以稍微调整了结构,使其使用置换为数据
元素时,匹配的自动合成的属性类型:
If you don't wish to use semantic actions (Boost Spirit: "Semantic actions are evil"?) you can slightly tweak the struct so that it matches the auto-synthesized attribute types when using the permutation for data
elements:
struct Data
{
boost::posix_time::ptime timestamp;
std::string id;
struct Fields {
boost::optional<int> data1;
boost::optional<std::string> data2;
} fields;
};
现在解析器可以只是:
timestamp = stream;
text = lexeme [ '"' >> *~char_('"') >> '"' ];
data1 = "key1" >> lit('=') >> int_;
data2 = "key2" >> lit('=') >> text;
id = lexeme [ *~char_(';') ];
start = timestamp >> ';' >> id >> ';' >> (data1 ^ data2);
更新
要的意见,使得弹性。我结束了改变从置换解析器而去,并与第一个编号的方法(即克莱尼明星与语义动作的方法)去。
id = lexeme [ *~char_(';') ];
auto data1 = bind(&Data::Fields::data1, _val);
auto data2 = bind(&Data::Fields::data2, _val);
other = lexeme [ +(graph-'=') ] >> '=' >> (real_|int_|text);
fields = *(
("key1" >> lit('=') >> int_) [ data1 = _1 ]
| ("key2" >> lit('=') >> text) [ data2 = _1 ]
| other
);
start = timestamp >> ';' >> id >> -(';' >> fields);
这改变了以下几个方面:
This changes the following aspects:
-
为了能够跳过其他领域,我需要拿出其他领域的一个合理的语法:
in order to be able to skip "other" fields, I needed to come up with a reasonable grammar for "other" fields:
other = lexeme [ +(graph-'=') ] >> '=' >> (real_|int_|text);
(允许除 =
包含任何非空白的关键,其次是 =
,后跟一些数字(渴望),或文本)。
(allows a key consisting of anything non-whitespace except =
, followed by the =
, followed by either something numeric (eager), or text).
我已经扩展文本的概念来支持流行的报价/转义方案:
I've extended the notion of text to support popular quoting/escaping schemes:
text = lexeme [
'"' >> *('\\' >> char_ | ~char_('"')) >> '"'
| "'" >> *('\\' >> char_ | ~char_("'")) >> "'"
| *graph
];
它允许重复相同的密钥(在这种情况下,它保留了的最后的有效可见值的)。
如果你想禁止无效值,替换&GT;&GT; INT _
或&GT;&GT;文字
按&GT; INT _
或&GT;文字
(即<一href=\"http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/qi/reference/operator/expect.html\"相对=nofollow>期望解析器)。
If you wanted to disallow invalid values, replace >> int_
or >> text
by > int_
or > text
(the expectation parser).
我曾与一些具有挑战性的情况下,延长了测试用例:
I've extended the test cases with some challenging cases:
2015-Jan-26 00:00:00;id
2015-Jan-26 14:59:24;id;key2="value"
2015-Jan-26 14:59:24;id;key2="value" key1=42
2015-Jan-26 14:59:24;id;key2="value" key1=42 something=awful __=4.74e-10 blarg;{blo;bloop='whatever \'ignor\'ed' key2="new} \"value\""
2015-Jan-26 14:59:24.123;id;key1=42 key2="value"
和现在的打印效果。
----------------------------------------
Parsing '2015-Jan-26 00:00:00;id'
Parsing success
2015-Jan-26 00:00:00 id
data1: --
data2: --
----------------------------------------
Parsing '2015-Jan-26 14:59:24;id;key2="value"'
Parsing success
2015-Jan-26 14:59:24 id
data1: --
data2: value
----------------------------------------
Parsing '2015-Jan-26 14:59:24;id;key2="value" key1=42'
Parsing success
2015-Jan-26 14:59:24 id
data1: 42
data2: value
----------------------------------------
Parsing '2015-Jan-26 14:59:24;id;key2="value" key1=42 something=awful __=4.74e-10 blarg;{blo;bloop='whatever \'ignor\'ed' key2="new} \"value\""'
Parsing success
2015-Jan-26 14:59:24 id
data1: 42
data2: new} "value"
----------------------------------------
Parsing '2015-Jan-26 14:59:24.123;id;key1=42 key2="value" '
Parsing success
2015-Jan-26 14:59:24.123000 id
data1: 42
data2: value
<大骨节病> 住在Coliru 骨节病>
//#define BOOST_SPIRIT_DEBUG
#include <boost/optional/optional_io.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>
#include <boost/date_time/posix_time/posix_time_io.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
struct Data
{
boost::posix_time::ptime timestamp;
std::string id;
struct Fields {
boost::optional<int> data1;
boost::optional<std::string> data2;
} fields;
};
BOOST_FUSION_ADAPT_STRUCT(Data::Fields,
(boost::optional<int>, data1)
(boost::optional<std::string>, data2)
)
BOOST_FUSION_ADAPT_STRUCT(Data,
(boost::posix_time::ptime, timestamp)
(std::string, id)
(Data::Fields, fields)
)
template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, Data(), Skipper> {
grammar() : grammar::base_type(start) {
using namespace qi;
timestamp = stream;
real_parser<double, strict_real_policies<double> > real_;
text = lexeme [
'"' >> *('\\' >> char_ | ~char_('"')) >> '"'
| "'" >> *('\\' >> char_ | ~char_("'")) >> "'"
| *graph
];
id = lexeme [ *~char_(';') ];
auto data1 = bind(&Data::Fields::data1, _val);
auto data2 = bind(&Data::Fields::data2, _val);
other = lexeme [ +(graph-'=') ] >> '=' >> (real_|int_|text);
fields = *(
("key1" >> lit('=') >> int_) [ data1 = _1 ]
| ("key2" >> lit('=') >> text) [ data2 = _1 ]
| other
);
start = timestamp >> ';' >> id >> -(';' >> fields);
BOOST_SPIRIT_DEBUG_NODES((timestamp)(id)(start)(text)(other)(fields))
}
private:
qi::rule<It, Skipper> other;
qi::rule<It, std::string(), Skipper> text, id;
qi::rule<It, boost::posix_time::ptime(), Skipper> timestamp;
qi::rule<It, Data::Fields(), Skipper> fields;
qi::rule<It, Data(), Skipper> start;
};
int main() {
using It = std::string::const_iterator;
for (std::string const input : {
"2015-Jan-26 00:00:00;id",
"2015-Jan-26 14:59:24;id;key2=\"value\"",
"2015-Jan-26 14:59:24;id;key2=\"value\" key1=42",
"2015-Jan-26 14:59:24;id;key2=\"value\" key1=42 something=awful __=4.74e-10 blarg;{blo;bloop='whatever \\'ignor\\'ed' key2=\"new} \\\"value\\\"\"",
"2015-Jan-26 14:59:24.123;id;key1=42 key2=\"value\" ",
})
{
std::cout << "----------------------------------------\nParsing '" << input << "'\n";
It f(input.begin()), l(input.end());
Data parsed;
bool ok = qi::phrase_parse(f,l,grammar<It>(),qi::space,parsed);
if (ok) {
std::cout << "Parsing success\n";
std::cout << parsed.timestamp << "\t" << parsed.id << "\n";
std::cout << "data1: " << parsed.fields.data1 << "\n";
std::cout << "data2: " << parsed.fields.data2 << "\n";
} else {
std::cout << "Parsing failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
这篇关于解析使用Boost异构数据::精神的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!