解析引用的字符串用的boost ::精神 [英] Parse quoted strings with boost::spirit

查看:155
本文介绍了解析引用的字符串用的boost ::精神的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想分析的句子,其中一些字符串可能是不带引号的报价或援引。下面的code几乎工程 - 但它不匹配收盘报价。我猜这是因为QQ的参考。的修改在code评论说,在修改reults援引或引用还分析,有助于显示原来的问题是关闭的引号。在code也介绍了具体的语法。

I would like to parse a sentence where some strings may be unquoted, 'quoted' or "quoted". The code below almost works - but it fails to match closing quotes. I'm guessing this is because of the qq reference. A modification is commented in the code, the modification reults in "quoted' or 'quoted" also parsing and helps show the original problem is with the closing quote. The code also describes the exact grammar.

要彻底清除:无引号字符串解析。带引号的字符串,如'你好'将解析公开报价 ,所有的字符你好,但随后无法解析的最终报价

To be completely clear: unquoted strings parse. A quoted string like 'hello' will parse the open quote ', all the characters hello, but then fail to parse the final quote '.

我作出的又一次尝试,类似的<一个开始/结束标记匹配href=\"http://www.boost.org/doc/libs/1_49_0/libs/spirit/doc/html/spirit/qi/tutorials/mini_xml___asts_.html\">boost教程的,但没有成功。

I made another attempt, similar the begin/end tag matching in the boost tutorials, but without success.

template <typename Iterator>
struct test_parser : qi::grammar<Iterator, dectest::Test(), ascii::space_type>
{
    test_parser()
        :
    test_parser::base_type(test, "test")
    {
        using qi::fail;
        using qi::on_error;
        using qi::lit;
        using qi::lexeme;
        using ascii::char_;
        using qi::repeat;
        using namespace qi::labels;
        using boost::phoenix::construct;
        using boost::phoenix::at_c;
        using boost::phoenix::push_back;
        using boost::phoenix::val;
        using boost::phoenix::ref;
        using qi::space;

        char qq;          

        arrow = lit("->");

        open_quote = (char_('\'') | char_('"')) [ref(qq) = _1];  // Remember what the opening quote was
        close_quote = lit(val(qq));  // Close must match the open
        // close_quote = (char_('\'') | char_('"')); // Enable this line to get code 'almost' working

        quoted_string = 
            open_quote
            >> +ascii::alnum        
            >> close_quote; 

        unquoted_string %= +ascii::alnum;
        any_string %= (quoted_string | unquoted_string);

        test = 
            unquoted_string             [at_c<0>(_val) = _1] 
            > unquoted_string           [at_c<1>(_val) = _1]   
            > repeat(1,3)[any_string]   [at_c<2>(_val) = _1]
            > arrow
            > any_string                [at_c<3>(_val) = _1] 
            ;

        // .. <snip>set rule names
        on_error<fail>(/* <snip> */);
        // debug rules
    }

    qi::rule<Iterator> arrow;
    qi::rule<Iterator> open_quote;
    qi::rule<Iterator> close_quote;

    qi::rule<Iterator, std::string()> quoted_string;
    qi::rule<Iterator, std::string()> unquoted_string;
    qi::rule<Iterator, std::string()> any_string;     // A quoted or unquoted string

    qi::rule<Iterator, dectest::Test(), ascii::space_type> test;

};


// main()
// This example should fail at the very end 
// (ie not parse "str3' because of the mismatched quote
// However, it fails to parse the closing quote of str1
typedef boost::tuple<string, string, vector<string>, string> DataT;
DataT data;
std::string str("addx001 add 'str1'   \"str2\"       ->  \"str3'");
std::string::const_iterator iter = str.begin();
const std::string::const_iterator end = str.end();
bool r = phrase_parse(iter, end, grammar, boost::spirit::ascii::space, data);

有关奖金积分:即避免本地数据成员(如字符QQ 在上面的例子中)一个解决方案是preferred,但是从实用的角度查看我会用任何作品!

For bonus credit: A solution that avoid a local data member (such as char qq in above example) would be preferred, but from a practical point of view I'll use anything that works!

推荐答案

要参考 QQ 成为离开的构造后晃来晃去,所以这确实是一个问题。

The reference to qq becomes dangling after leaving the constructor, so that is indeed a problem.

<一个href=\"http://www.boost.org/doc/libs/1_48_0/libs/spirit/doc/html/spirit/qi/reference/parser_concepts/nonterminal.html#spirit.qi.reference.parser_concepts.nonterminal.locals\"><$c$c>qi::locals是的规范的方式来保持内部解析器前pressions本地状态。您的其他选择是(通过它的语法类的成员,例如),以 QQ 的寿命延长。最后,你可能会感兴趣的 继承属性 的为好。这种机制给你一个方法来调用规则/语法与'参数'(周围路过本地状态)。

qi::locals is the canonical way to keep local state inside parser expressions. Your other option would be to extend the lifetime of qq (by making it a member of the grammar class, e.g.). Lastly, you might be interested in inherited attributes as well. This mechanism gives you a way to call a rule/grammar with 'parameters' (passing local state around).

<分>

注意有与使用克林运营商 + 的注意事项:这是贪婪的,如果字符串解析失败没有终止与预期报价。

NOTE There are caveats with the use of the kleene operator +: it is greedy, and parsing fails if the string is not terminated with the expected quote.

请参阅另一种答案,我写在治疗任意内容更完整的例子(可选/部分)引用的字符串,允许引用的字符串和类似的更多的东西里面引号逃逸的:

See another answer I wrote for more complete examples of treating arbitrary contents in (optionally/partially) quoted strings, that allow escaping of quotes inside quoted strings and more things like that:

我已经减少了语法的相关位,其中包括一些测试情况:

I've reduced the grammar to the relevant bit, and included a few test cases:

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>

namespace qi = boost::spirit::qi;

template <typename Iterator>
struct test_parser : qi::grammar<Iterator, std::string(), qi::space_type, qi::locals<char> >
{
    test_parser() : test_parser::base_type(any_string, "test")
    {
        using namespace qi;

        quoted_string = 
               omit    [ char_("'\"") [_a =_1] ]             
            >> no_skip [ *(char_ - char_(_a))  ]
            >> lit(_a)
        ; 

        any_string = quoted_string | +qi::alnum;
    }

    qi::rule<Iterator, std::string(), qi::space_type, qi::locals<char> > quoted_string, any_string;
};

int main()
{
    test_parser<std::string::const_iterator> grammar;
    const char* strs[] = { "\"str1\"", 
                           "'str2'",
                           "'str3' trailing ok",
                           "'st\"r4' embedded also ok",
                           "str5",
                           "str6'",
                           NULL };

    for (const char** it = strs; *it; ++it)
    {
        const std::string str(*it);
        std::string::const_iterator iter = str.begin();
        std::string::const_iterator end  = str.end();

        std::string data;
        bool r = phrase_parse(iter, end, grammar, qi::space, data);

        if (r)
            std::cout << "Parsed:    " << str << " --> " << data << "\n";
        if (iter!=end)
            std::cout << "Remaining: " << std::string(iter,end) << "\n";
    }
}

输出:

Parsed:    "str1" --> str1
Parsed:    'str2' --> str2
Parsed:    'str3' trailing ok --> str3
Remaining: trailing ok
Parsed:    'st"r4' embedded also ok --> st"r4
Remaining: embedded also ok
Parsed:    str5 --> str5
Parsed:    str6' --> str6
Remaining: '

这篇关于解析引用的字符串用的boost ::精神的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆