解析使用Boost异构数据::精神 [英] Parsing heterogeneous data using Boost::Spirit

查看:164
本文介绍了解析使用Boost异构数据::精神的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚如何处理以下问题。

I'm trying to figure out how to approach the following problem.

我有以下格式的结构:

struct Data
{
     time_t timestamp;
     string id;
     boost::optional<int> data1;
     boost::optional<string> data2;
     // etc...
};

这应分析出以下格式的单行字符串:

This should be parsed out of a single line string in the following format:

human_readable_timestamp;id;key1=value1 key2=value2.....

当然键的顺序不必匹配在结构元素的顺序

Of course the ordering of the keys does not have to match the order of elements in the structure.

时的boost ::灵适合这种类型的数据?我该如何处理这?我已经通过了的例子,但我不能管理从例子适合我的要求code就搞定了。

Is Boost::Spirit suitable for this type of data? How do I approach this? I have gone through the examples, but I can't manage to get from the examples to code that fits my requirements.

推荐答案

您可以使用置换解析器。我做了一个非常类似的例子在这里:

You could use the permutation parser. I've made a very similar example here:

  • Reading JSON file with C++ and BOOST

如果您有重复键,那么它更有意义,使用克莱尼* ,或许是

If you have repeating keys, then it makes more sense to use a Kleene*, perhaps


  1. 与语义动作分配属性/或/

  2. 使用属性定制点分配结果

  3. PS。另外,也要看看从精神库关键字解析器(<一个href=\"http://stackoverflow.com/questions/27812715/boost-qi-composing-rules-using-functions/27816065#27816065\">Boost使用功能齐撰写规则的)

  1. with semantic actions to assign the attributes /or/
  2. using attribute customization points to assign the result
  3. PS. Also look at the keyword parser from Spirit Repository (Boost Qi Composing rules using Functions)

如果你不希望使用语义动作(升压精神:&QUOT;语义行为是邪恶的&QUOT;?),你可以稍微调整了结构,使其使用置换为数据元素时,匹配的自动合成的属性类型:

If you don't wish to use semantic actions (Boost Spirit: "Semantic actions are evil"?) you can slightly tweak the struct so that it matches the auto-synthesized attribute types when using the permutation for data elements:

struct Data
{
    boost::posix_time::ptime timestamp;
    std::string id;
    struct Fields {
        boost::optional<int> data1;
        boost::optional<std::string> data2;
    } fields;
};

现在解析器可以只是:

    timestamp = stream;

    text  = lexeme [ '"' >> *~char_('"') >> '"' ];
    data1 = "key1" >> lit('=') >> int_;
    data2 = "key2" >> lit('=') >> text;
    id    = lexeme [ *~char_(';') ];

    start = timestamp >> ';' >> id >> ';' >> (data1 ^ data2);

更新

要的意见,使得弹性。我结束了改变从置换解析器而去,并与第一个编号的方法(即克莱尼明星与语义动作的方法)去。

    id     = lexeme [ *~char_(';') ];

    auto data1 = bind(&Data::Fields::data1, _val);
    auto data2 = bind(&Data::Fields::data2, _val);

    other  = lexeme [ +(graph-'=') ] >> '=' >> (real_|int_|text);

    fields = *(
                ("key1" >> lit('=') >> int_) [ data1 = _1 ]
              | ("key2" >> lit('=') >> text) [ data2 = _1 ]
              | other
              );

    start  = timestamp >> ';' >> id >> -(';' >> fields);

这改变了以下几个方面:

This changes the following aspects:


  • 为了能够跳过其他领域,我需要拿出其他领域的一个合理的语法:

  • in order to be able to skip "other" fields, I needed to come up with a reasonable grammar for "other" fields:

other  = lexeme [ +(graph-'=') ] >> '=' >> (real_|int_|text);

(允许除 = 包含任何非空白的关键,其次是 = ,后跟一些数字(渴望),或文本)。

(allows a key consisting of anything non-whitespace except =, followed by the =, followed by either something numeric (eager), or text).

我已经扩展文本的概念来支持流行的报价/转义方案:

I've extended the notion of text to support popular quoting/escaping schemes:

text   = lexeme [ 
            '"' >> *('\\' >> char_ | ~char_('"')) >> '"'
          | "'" >> *('\\' >> char_ | ~char_("'")) >> "'"
          | *graph 
       ];


  • 它允许重复相同的密钥(在这种情况下,它保留了的最后的有效可见值的)。

    如果你想禁止无效值,替换&GT;&GT; INT _ &GT;&GT;文字&GT; INT _ &GT;文字(即<一href=\"http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/qi/reference/operator/expect.html\"相对=nofollow>期望解析器)。

    If you wanted to disallow invalid values, replace >> int_ or >> text by > int_ or > text (the expectation parser).

    我曾与一些具有挑战性的情况下,延长了测试用例:

    I've extended the test cases with some challenging cases:

        2015-Jan-26 00:00:00;id
        2015-Jan-26 14:59:24;id;key2="value"
        2015-Jan-26 14:59:24;id;key2="value" key1=42
        2015-Jan-26 14:59:24;id;key2="value" key1=42 something=awful __=4.74e-10 blarg;{blo;bloop='whatever \'ignor\'ed' key2="new} \"value\""
        2015-Jan-26 14:59:24.123;id;key1=42 key2="value" 
    

    和现在的打印效果。

    ----------------------------------------
    Parsing '2015-Jan-26 00:00:00;id'
    Parsing success
    2015-Jan-26 00:00:00    id
    data1: --
    data2: --
    ----------------------------------------
    Parsing '2015-Jan-26 14:59:24;id;key2="value"'
    Parsing success
    2015-Jan-26 14:59:24    id
    data1: --
    data2:  value
    ----------------------------------------
    Parsing '2015-Jan-26 14:59:24;id;key2="value" key1=42'
    Parsing success
    2015-Jan-26 14:59:24    id
    data1:  42
    data2:  value
    ----------------------------------------
    Parsing '2015-Jan-26 14:59:24;id;key2="value" key1=42 something=awful __=4.74e-10 blarg;{blo;bloop='whatever \'ignor\'ed' key2="new} \"value\""'
    Parsing success
    2015-Jan-26 14:59:24    id
    data1:  42
    data2:  new} "value"
    ----------------------------------------
    Parsing '2015-Jan-26 14:59:24.123;id;key1=42 key2="value" '
    Parsing success
    2015-Jan-26 14:59:24.123000 id
    data1:  42
    data2:  value
    

    <大骨节病> 住在Coliru

    //#define BOOST_SPIRIT_DEBUG
    #include <boost/optional/optional_io.hpp>
    #include <boost/date_time/posix_time/posix_time.hpp>
    #include <boost/date_time/posix_time/posix_time_io.hpp>
    #include <boost/fusion/adapted/struct.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    
    namespace qi = boost::spirit::qi;
    namespace phx = boost::phoenix;
    
    struct Data
    {
        boost::posix_time::ptime timestamp;
        std::string id;
        struct Fields {
            boost::optional<int> data1;
            boost::optional<std::string> data2;
        } fields;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(Data::Fields,
            (boost::optional<int>, data1)
            (boost::optional<std::string>, data2)
        )
    
    BOOST_FUSION_ADAPT_STRUCT(Data,
            (boost::posix_time::ptime, timestamp)
            (std::string, id)
            (Data::Fields, fields)
        )
    
    template <typename It, typename Skipper = qi::space_type>
    struct grammar : qi::grammar<It, Data(), Skipper> {
        grammar() : grammar::base_type(start) {
            using namespace qi;
            timestamp = stream;
    
            real_parser<double, strict_real_policies<double> > real_;
    
            text   = lexeme [ 
                        '"' >> *('\\' >> char_ | ~char_('"')) >> '"'
                      | "'" >> *('\\' >> char_ | ~char_("'")) >> "'"
                      | *graph 
                   ];
    
            id     = lexeme [ *~char_(';') ];
    
            auto data1 = bind(&Data::Fields::data1, _val);
            auto data2 = bind(&Data::Fields::data2, _val);
    
            other  = lexeme [ +(graph-'=') ] >> '=' >> (real_|int_|text);
    
            fields = *(
                        ("key1" >> lit('=') >> int_) [ data1 = _1 ]
                      | ("key2" >> lit('=') >> text) [ data2 = _1 ]
                      | other
                      );
    
            start  = timestamp >> ';' >> id >> -(';' >> fields);
    
            BOOST_SPIRIT_DEBUG_NODES((timestamp)(id)(start)(text)(other)(fields))
        }
      private:
        qi::rule<It,                                 Skipper> other;
        qi::rule<It, std::string(),                  Skipper> text, id;
        qi::rule<It, boost::posix_time::ptime(),     Skipper> timestamp;
        qi::rule<It, Data::Fields(),                 Skipper> fields;
        qi::rule<It, Data(),                         Skipper> start;
    };
    
    int main() {
        using It = std::string::const_iterator;
        for (std::string const input : {
                "2015-Jan-26 00:00:00;id",
                "2015-Jan-26 14:59:24;id;key2=\"value\"",
                "2015-Jan-26 14:59:24;id;key2=\"value\" key1=42",
                "2015-Jan-26 14:59:24;id;key2=\"value\" key1=42 something=awful __=4.74e-10 blarg;{blo;bloop='whatever \\'ignor\\'ed' key2=\"new} \\\"value\\\"\"",
                "2015-Jan-26 14:59:24.123;id;key1=42 key2=\"value\" ",
                })
        {
            std::cout << "----------------------------------------\nParsing '" << input << "'\n";
            It f(input.begin()), l(input.end());
            Data parsed;
            bool ok = qi::phrase_parse(f,l,grammar<It>(),qi::space,parsed);
    
            if (ok) {
                std::cout << "Parsing success\n";
                std::cout << parsed.timestamp << "\t" << parsed.id << "\n";
                std::cout << "data1: " << parsed.fields.data1 << "\n";
                std::cout << "data2: " << parsed.fields.data2 << "\n";
            } else {
                std::cout << "Parsing failed\n";
            }
    
            if (f!=l)
                std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
        }
    }
    

    这篇关于解析使用Boost异构数据::精神的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆