是Boost跳过解析器的正确方法吗? [英] Is Boost skip parser the right approach?
问题描述
一段延迟后,我现在又试图解析一些由一些二进制字符包围的ASCII文本文件
。
但是如果一个跳过解析器是正确的方法,我现在正在努力?
文件的语法是JEDEC文件简单:
文件中的每个数据字段都以单个字母开头,并以星号结束。数据字段可以包含空格和回车。
星号空格和回车可能跟在
下一个字段标识符之前。
这是我用来开始构建一个解析器这样的文件:
phrase_parse(first,last,
// File中的第一个字符
char_ '\ x02')>
//数据字段
*((print [cout <_1] | graph [cout <_1] char _('*'))>>
//数据结尾跟随着4位十六进制数如何限制
char _('\x03')> * xdigit,
//跳过星号
char _('*'));
不幸的是我没有从这一个输出。
示例文件:
John Doe生成的JEDEC文件*
DM SIGNETICS(PHILIPS)*
DD GAL16R8 *
QP20 *
QV0 *
G0 * F0 *
L00000 1110101111100110111101101110111100111111 *
CDEAD *
< ETX> BEEF
这是我想要的achive:
开始:John Doe生成的JEDEC文件
D:M SIGNETICS )
D:D GAL16R8
Q:P20
Q:V0
G:0
F:0
L:00000 1110101111100110111101101110111100111111
C :DEAD
结束:BEEF
建议你只想在顶层规则使用船长。
您不使用星号的船长,因为您可以使用星号 不要要忽略它们。如果他们被忽略,你的规则不能对他们采取行动。
此外,内部规则不应该使用空格跳过简单的原因,空格和换行符是有效的字段数据
因此,所有这些的结果是:
value = *(ascii :: char _(\x20-\x7e\r\\\
) - '*')>> '*';
field = ascii :: graph>>值;
start = STX>>值>> * field>> ETX>> xmit_checksum;
这些规则将用各自的skippers声明:
qi :: uint_parser< uint16_t,16,4,4> xmit_checksum;
qi :: rule< It,ascii :: space_type>开始;
qi :: rule< It>字段,值; // no skippers - they are lexemes
:在规则中拆分您的语法。
处理结果
您的样品不必要地混合分析和打印的职责。
我建议不要在这里使用语义操作( Boost Spirit:语义
$ b>
struct JEDEC {
std :: string caption;
struct field {
char id;
std :: string value;
};
std :: vector< field>字段;
uint16_t checksum;
};
并在规则中声明:
qi :: rule< It,ast :: JEDEC(),ascii :: space_type>开始;
qi :: rule< It,ast :: JEDEC :: field()>领域;
qi :: rule< It,std :: string()>值;
qi :: uint_parser< uint16_t,16,4,4> xmit_checksum;
现在,您的语法无需更改,您可以使用以下命令打印所需的输出: / p>
inline static std :: ostream& operator<<(std :: ostream& os,JEDEC const& jedec){
os< 开始:< jedec.caption<< \\\
;
for(auto& f:jedec.fields)
os<< f.id<< :<< f.value<< \\\
;
auto saved = os.rdstate();
os<< End:<< std :: hex<< std :: setw(4)<< std :: setfill('0')<< jedec.checksum;
os.setstate(saved);
return os;
}
LIVE DEMO
这是一个演示程序,它使用您的问题的示例输入将它绑定在一起:
//#define BOOST_SPIRIT_DEBUG
#include< boost / fusion / adapted / struct.hpp>
#include< boost / spirit / include / qi.hpp>
#include< iomanip>
命名空间qi = boost :: spirit :: qi;
namespace ascii = qi :: ascii;
namespace ast {
struct JEDEC {
std :: string caption;
struct field {
char id;
std :: string value;
};
std :: vector< field>字段;
uint16_t checksum;
};
inline static std :: ostream& operator<<(std :: ostream& os,JEDEC const& jedec){
os< 开始:< jedec.caption<< \\\
;
for(auto& f:jedec.fields)
os<< f.id<< :<< f.value<< \\\
;
auto saved = os.rdstate();
os<< End:<< std :: hex<< std :: setw(4)<< std :: setfill('0')<< std :: uppercase<< jedec.checksum;
os.setstate(saved);
return os;
}
}
BOOST_FUSION_ADAPT_STRUCT(ast :: JEDEC :: field,
(char,id)(std :: string,value))
BOOST_FUSION_ADAPT_STRUCT(ast :: JEDEC,
(std :: string,caption)
(std :: vector< ast :: JEDEC :: field> fields)
(uint16_t,checksum))
template< typename it>
struct JedecGrammar:qi :: grammar< It,ast :: JEDEC(),ascii :: space_type>
{
JedecGrammar():JedecGrammar :: base_type(start){
const char STX ='\x02';
const char ETX ='\x03';
value = *(ascii :: char _(\x20-\x7e\r\\\
) - '*')>> '*';
field = ascii :: graph>>值;
start = STX>>值>> * field>> ETX>> xmit_checksum;
BOOST_SPIRIT_DEBUG_NODES((start)(field)(value))
}
private:
qi :: rule< It,ast :: JEDEC :: space_type>开始;
qi :: rule< It,ast :: JEDEC :: field()>领域;
qi :: rule< It,std :: string()>值;
qi :: uint_parser< uint16_t,16,4,4> xmit_checksum
};
int main(){
typedef boost :: spirit :: istream_iterator It;
首先(std :: cin>> std :: noskipws),last;
JedecGrammar< it> G;
ast :: JEDEC jedec;
bool ok = phrase_parse(first,last,g,ascii :: space,jedec);
if(ok)
{
std :: cout< parse success\\\
;
std :: cout<<杰德克
}
else
std :: cout<< 解析失败\\\
;
if(first!= last)
std :: cout<< 剩余输入未解析:'< std :: string(first,last)<< '\\\
;
}
输出:
开始:John Doe生成的JEDEC文件
D:M SIGNETICS(PHILIPS)
D:D GAL16R8
Q:P20
Q: V0
G:0
F:0
L:00000 1110101111100110111101101110111100111111
C:DEAD
结束:BEEF
外卖:每年两次查看您的牙医。 After some delay I'm now again trying to parse some ASCII text file
surrounded by some binary characters. Parsing text file with binary envelope using boost Spririt However I'm now struggling if a skip parser is the right approach? The grammar of the file (it's a JEDEC file) is quite simple: Each data field in the file starts with a single letter and ends with an asterisk. The data field can contain spaces and carriage return.
After the asterisk spaces and carriage return might follow too before the
next field identifier. This is what I used to start building a parser for such a file: Unfortunately I don't get any output from this one. Does someone have an idea what might be wrong? Sample file: and this is what I want to achive:
I would suggest you want to use a skipper at the toplevel rule only. And use it to skip the insignificant whitespace. You don't use a skipper for the asterisks because you do not want to ignore them. If they're ignored, your rules cannot act upon them. Furthermore the inner rules should not use the space skipper for the simple reason that whitespace and linefeeds are valid field data in JEDEC. So, the upshot of all this would be: Where the rules would be declared with the respective skippers:
Take-away: Split your grammar up in rules. Be happier for it.
Your sample needlessly mixes responsibilities for parsing and "printing".
I'd suggest not using semantic actions here (Boost Spirit: "Semantic actions are evil"?). Instead, declare appropriate attribute types: And declare them in your rules: Now, nothing needs to be changed in your grammar, and you can print the desired output with:
Here's a demo program that ties it together using the sample input from your question: Output:
Take-away: See your dentist twice a year.
这篇关于是Boost跳过解析器的正确方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
phrase_parse(first, last,
// First char in File
char_('\x02') >>
// Data field
*((print[cout << _1] | graph[cout << _1]) - char_('*')) >>
// End of data followed by 4 digit hexnumber. How to limit?
char_('\x03') >> *xdigit,
// Skip asterisks
char_('*') );
<STX>
JEDEC file generated by John Doe*
DM SIGNETICS(PHILIPS)*
DD GAL16R8*
QP20*
QV0*
G0*F0*
L00000 1110101111100110111101101110111100111111*
CDEAD*
<ETX>BEEF
Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF
value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
field = ascii::graph >> value;
start = STX >> value >> *field >> ETX >> xmit_checksum;
qi::uint_parser<uint16_t, 16, 4, 4> xmit_checksum;
qi::rule<It, ascii::space_type> start;
qi::rule<It> field, value; // no skippers - they are lexemes
Processing the results
struct JEDEC {
std::string caption;
struct field {
char id;
std::string value;
};
std::vector<field> fields;
uint16_t checksum;
};
qi::rule<It, ast::JEDEC(), ascii::space_type> start;
qi::rule<It, ast::JEDEC::field()> field;
qi::rule<It, std::string()> value;
qi::uint_parser<uint16_t, 16, 4, 4> xmit_checksum;
inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
os << "Start: " << jedec.caption << "\n";
for(auto& f : jedec.fields)
os << f.id << ": " << f.value << "\n";
auto saved = os.rdstate();
os << "End: " << std::hex << std::setw(4) << std::setfill('0') << jedec.checksum;
os.setstate(saved);
return os;
}
LIVE DEMO
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace ascii = qi::ascii;
namespace ast {
struct JEDEC {
std::string caption;
struct field {
char id;
std::string value;
};
std::vector<field> fields;
uint16_t checksum;
};
inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
os << "Start: " << jedec.caption << "\n";
for(auto& f : jedec.fields)
os << f.id << ": " << f.value << "\n";
auto saved = os.rdstate();
os << "End: " << std::hex << std::setw(4) << std::setfill('0') << std::uppercase << jedec.checksum;
os.setstate(saved);
return os;
}
}
BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC::field,
(char, id)(std::string, value))
BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC,
(std::string, caption)
(std::vector<ast::JEDEC::field>, fields)
(uint16_t, checksum))
template <typename It>
struct JedecGrammar : qi::grammar<It, ast::JEDEC(), ascii::space_type>
{
JedecGrammar() : JedecGrammar::base_type(start) {
const char STX = '\x02';
const char ETX = '\x03';
value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
field = ascii::graph >> value;
start = STX >> value >> *field >> ETX >> xmit_checksum;
BOOST_SPIRIT_DEBUG_NODES((start)(field)(value))
}
private:
qi::rule<It, ast::JEDEC(), ascii::space_type> start;
qi::rule<It, ast::JEDEC::field()> field;
qi::rule<It, std::string()> value;
qi::uint_parser<uint16_t, 16, 4, 4> xmit_checksum;
};
int main() {
typedef boost::spirit::istream_iterator It;
It first(std::cin>>std::noskipws), last;
JedecGrammar<It> g;
ast::JEDEC jedec;
bool ok = phrase_parse(first, last, g, ascii::space, jedec);
if (ok)
{
std::cout << "Parse success\n";
std::cout << jedec;
}
else
std::cout << "Parse failed\n";
if (first != last)
std::cout << "Remaining input unparsed: '" << std::string(first, last) << "'\n";
}
Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF