是Boost跳过解析器的正确方法吗? [英] Is Boost skip parser the right approach?

查看:139
本文介绍了是Boost跳过解析器的正确方法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一段延迟后,我现在又试图解析一些由一些二进制字符包围的ASCII文本文件



使用boost Spririt解析二进制包络的文本文件



但是如果一个跳过解析器是正确的方法,我现在正在努力?



文件的语法是JEDEC文件简单:



文件中的每个数据字段都以单个字母开头,并以星号结束。数据字段可以包含空格和回车。
星号空格和回车可能跟在
下一个字段标识符之前。



这是我用来开始构建一个解析器这样的文件:

  phrase_parse(first,last,
// File中的第一个字符
char_ '\ x02')>

//数据字段
*((print [cout <_1] | graph [cout <_1] char _('*'))>>

//数据结尾跟随着4位十六进制数如何限制
char _('\x03')> * xdigit,

//跳过星号
char _('*'));

不幸的是我没有从这一个输出。



示例文件:

  John Doe生成的JEDEC文件* 
DM SIGNETICS(PHILIPS)*
DD GAL16R8 *
QP20 *
QV0 *
G0 * F0 *
L00000 1110101111100110111101101110111100111111 *
CDEAD *
< ETX> BEEF

这是我想要的achive:

 开始:John Doe生成的JEDEC文件
D:M SIGNETICS )
D:D GAL16R8
Q:P20
Q:V0
G:0
F:0
L:00000 1110101111100110111101101110111100111111
C :DEAD
结束:BEEF


解决方案

建议你只想在顶层规则使用船长。



您不使用星号的船长,因为您可以使用星号 不要要忽略它们。如果他们被忽略,你的规则不能对他们采取行动。



此外,内部规则不应该使用空格跳过简单的原因,空格和换行符是有效的字段数据



因此,所有这些的结果是:

  value = *(ascii :: char _(\x20-\x7e\r\\\
) - '*')>> '*';
field = ascii :: graph>>值;
start = STX>>值>> * field>> ETX>> xmit_checksum;

这些规则将用各自的skippers声明:

  qi :: uint_parser< uint16_t,16,4,4> xmit_checksum; 
qi :: rule< It,ascii :: space_type>开始;
qi :: rule< It>字段,值; // no skippers - they are lexemes




:在规则中拆分您的语法。




处理结果



您的样品不必要地混合分析和打印的职责。
我建议不要在这里使用语义操作( Boost Spirit:语义

 
$ b

> struct JEDEC {
std :: string caption;
struct field {
char id;
std :: string value;
};
std :: vector< field>字段;
uint16_t checksum;
};

并在规则中声明:

  qi :: rule< It,ast :: JEDEC(),ascii :: space_type>开始; 
qi :: rule< It,ast :: JEDEC :: field()>领域;
qi :: rule< It,std :: string()>值;
qi :: uint_parser< uint16_t,16,4,4> xmit_checksum;

现在,您的语法无需更改,您可以使用以下命令打印所需的输出: / p>

  inline static std :: ostream& operator<<(std :: ostream& os,JEDEC const& jedec){
os< 开始:< jedec.caption<< \\\
;
for(auto& f:jedec.fields)
os<< f.id<< :<< f.value<< \\\
;

auto saved = os.rdstate();
os<< End:<< std :: hex<< std :: setw(4)<< std :: setfill('0')<< jedec.checksum;
os.setstate(saved);

return os;
}



LIVE DEMO



这是一个演示程序,它使用您的问题的示例输入将它绑定在一起:



Live on Coliru

  //#define BOOST_SPIRIT_DEBUG 
#include< boost / fusion / adapted / struct.hpp>
#include< boost / spirit / include / qi.hpp>
#include< iomanip>

命名空间qi = boost :: spirit :: qi;
namespace ascii = qi :: ascii;

namespace ast {
struct JEDEC {
std :: string caption;
struct field {
char id;
std :: string value;
};
std :: vector< field>字段;
uint16_t checksum;
};

inline static std :: ostream& operator<<(std :: ostream& os,JEDEC const& jedec){
os< 开始:< jedec.caption<< \\\
;
for(auto& f:jedec.fields)
os<< f.id<< :<< f.value<< \\\
;

auto saved = os.rdstate();
os<< End:<< std :: hex<< std :: setw(4)<< std :: setfill('0')<< std :: uppercase<< jedec.checksum;
os.setstate(saved);

return os;
}
}

BOOST_FUSION_ADAPT_STRUCT(ast :: JEDEC :: field,
(char,id)(std :: string,value))
BOOST_FUSION_ADAPT_STRUCT(ast :: JEDEC,
(std :: string,caption)
(std :: vector< ast :: JEDEC :: field> fields)
(uint16_t,checksum))

template< typename it>
struct JedecGrammar:qi :: grammar< It,ast :: JEDEC(),ascii :: space_type>
{
JedecGrammar():JedecGrammar :: base_type(start){
const char STX ='\x02';
const char ETX ='\x03';

value = *(ascii :: char _(\x20-\x7e\r\\\
) - '*')>> '*';
field = ascii :: graph>>值;
start = STX>>值>> * field>> ETX>> xmit_checksum;

BOOST_SPIRIT_DEBUG_NODES((start)(field)(value))
}
private:
qi :: rule< It,ast :: JEDEC :: space_type>开始;
qi :: rule< It,ast :: JEDEC :: field()>领域;
qi :: rule< It,std :: string()>值;
qi :: uint_parser< uint16_t,16,4,4> xmit_checksum
};

int main(){
typedef boost :: spirit :: istream_iterator It;
首先(std :: cin>> std :: noskipws),last;

JedecGrammar< it> G;

ast :: JEDEC jedec;
bool ok = phrase_parse(first,last,g,ascii :: space,jedec);

if(ok)
{
std :: cout< parse success\\\
;
std :: cout<<杰德克
}
else
std :: cout<< 解析失败\\\
;

if(first!= last)
std :: cout<< 剩余输入未解析:'< std :: string(first,last)<< '\\\
;
}

输出:

 开始:John Doe生成的JEDEC文件
D:M SIGNETICS(PHILIPS)
D:D GAL16R8
Q:P20
Q: V0
G:0
F:0
L:00000 1110101111100110111101101110111100111111
C:DEAD
结束:BEEF



外卖:每年两次查看您的牙医。



After some delay I'm now again trying to parse some ASCII text file surrounded by some binary characters.

Parsing text file with binary envelope using boost Spririt

However I'm now struggling if a skip parser is the right approach?

The grammar of the file (it's a JEDEC file) is quite simple:

Each data field in the file starts with a single letter and ends with an asterisk. The data field can contain spaces and carriage return. After the asterisk spaces and carriage return might follow too before the next field identifier.

This is what I used to start building a parser for such a file:

phrase_parse(first, last, 
             // First char in File
             char_('\x02') >>

             // Data field
             *((print[cout << _1] | graph[cout << _1]) - char_('*')) >>

             // End of data followed by 4 digit hexnumber. How to limit?
             char_('\x03') >> *xdigit,

             // Skip asterisks
             char_('*') );

Unfortunately I don't get any output from this one. Does someone have an idea what might be wrong?

Sample file:

<STX>
JEDEC file generated by John Doe*
DM SIGNETICS(PHILIPS)*
DD GAL16R8*
QP20*
QV0*
G0*F0*
L00000 1110101111100110111101101110111100111111*
CDEAD*
<ETX>BEEF

and this is what I want to achive:

Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF

解决方案

I would suggest you want to use a skipper at the toplevel rule only. And use it to skip the insignificant whitespace.

You don't use a skipper for the asterisks because you do not want to ignore them. If they're ignored, your rules cannot act upon them.

Furthermore the inner rules should not use the space skipper for the simple reason that whitespace and linefeeds are valid field data in JEDEC.

So, the upshot of all this would be:

value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
field = ascii::graph >> value;
start = STX >> value >> *field >> ETX >> xmit_checksum; 

Where the rules would be declared with the respective skippers:

qi::uint_parser<uint16_t, 16, 4, 4>           xmit_checksum;
qi::rule<It, ascii::space_type> start;
qi::rule<It>             field, value; // no skippers - they are lexemes

Take-away: Split your grammar up in rules. Be happier for it.

Processing the results

Your sample needlessly mixes responsibilities for parsing and "printing". I'd suggest not using semantic actions here (Boost Spirit: "Semantic actions are evil"?).

Instead, declare appropriate attribute types:

struct JEDEC {
    std::string caption;
    struct field { 
        char id;
        std::string value;
    };
    std::vector<field> fields;
    uint16_t checksum;
};

And declare them in your rules:

qi::rule<It, ast::JEDEC(), ascii::space_type> start;
qi::rule<It, ast::JEDEC::field()>             field;
qi::rule<It, std::string()>                   value;
qi::uint_parser<uint16_t, 16, 4, 4>           xmit_checksum;

Now, nothing needs to be changed in your grammar, and you can print the desired output with:

inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
    os << "Start: " << jedec.caption << "\n";
    for(auto& f : jedec.fields)
        os << f.id << ": " << f.value << "\n";

    auto saved = os.rdstate();
    os << "End: " << std::hex << std::setw(4) << std::setfill('0') << jedec.checksum;
    os.setstate(saved);

    return os;
}

LIVE DEMO

Here's a demo program that ties it together using the sample input from your question:

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>

namespace qi = boost::spirit::qi;
namespace ascii = qi::ascii;

namespace ast {
    struct JEDEC {
        std::string caption;
        struct field { 
            char id;
            std::string value;
        };
        std::vector<field> fields;
        uint16_t checksum;
    };

    inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
        os << "Start: " << jedec.caption << "\n";
        for(auto& f : jedec.fields)
            os << f.id << ": " << f.value << "\n";

        auto saved = os.rdstate();
        os << "End: " << std::hex << std::setw(4) << std::setfill('0') << std::uppercase << jedec.checksum;
        os.setstate(saved);

        return os;
    }
}

BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC::field,
        (char, id)(std::string, value))
BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC,
        (std::string, caption)
        (std::vector<ast::JEDEC::field>, fields)
        (uint16_t, checksum))

template <typename It> 
struct JedecGrammar : qi::grammar<It, ast::JEDEC(), ascii::space_type>
{
    JedecGrammar() : JedecGrammar::base_type(start) {
        const char STX = '\x02';
        const char ETX = '\x03';

        value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
        field = ascii::graph >> value;
        start = STX >> value >> *field >> ETX >> xmit_checksum; 

        BOOST_SPIRIT_DEBUG_NODES((start)(field)(value))
    }
  private:
    qi::rule<It, ast::JEDEC(), ascii::space_type> start;
    qi::rule<It, ast::JEDEC::field()>             field;
    qi::rule<It, std::string()>                   value;
    qi::uint_parser<uint16_t, 16, 4, 4>           xmit_checksum;
};

int main() {
    typedef boost::spirit::istream_iterator It;
    It first(std::cin>>std::noskipws), last;

    JedecGrammar<It> g;

    ast::JEDEC jedec;
    bool ok = phrase_parse(first, last, g, ascii::space, jedec);

    if (ok)
    {
        std::cout << "Parse success\n";
        std::cout << jedec;
    }
    else
        std::cout << "Parse failed\n";

    if (first != last)
        std::cout << "Remaining input unparsed: '" << std::string(first, last) << "'\n";
}

Output:

Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF

Take-away: See your dentist twice a year.

这篇关于是Boost跳过解析器的正确方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆