如何使用boost :: spirit将语法解析为std :: set? [英] How to parse a grammar into a `std::set` using `boost::spirit`?

查看:67
本文介绍了如何使用boost :: spirit将语法解析为std :: set?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将 boost :: spirit 语法的结果解析为 std :: set ?

How to parse the result of a boost::spirit grammar into an std::set?

作为学习如何使用 boost :: spirit 的练习,我正在设计X.500/LDAP专有名称的解析器.可以在 RFC-1779 中以BNF格式找到语法.

As an exercise to learn how to use boost::spirit, I am designing a parser for X.500/LDAP Distinguished Names. The grammar can be found in a BNF format in the RFC-1779.

我展开"并将其翻译为 boost :: spirit 规则.那是第一步.基本上,DN是一组RDN(相对专有名称),它们本身是(Key,Value)对的元组.

I "unrolled" it and translated it into boost::spirit rules. That's the first step. Basically, a DN is a set of RDN (Relative Distinguished Names) which themselves are tuples of (Key,Value) pairs.

我考虑使用

typedef std::unordered_map<std::string, std::string> rdn_type;

代表每个RDN.然后将RDN收集到 std :: set< rdn_type>

to represent each RDN. The RDNs are then gathered into a std::set<rdn_type>

我的问题是,在浏览 boost :: spirit 的(好的)文档时,我不知道如何填充集合.

My issue is that going through the (good) documentation of boost::spirit, I didn't find out how to populate the set.

我当前的代码可以在 github 上找到,我我会尽可能地完善它.

My current code can be found on github and I'm trying to refine it whenever I can.

发起撒旦舞来召集SO最受欢迎的北极熊:p

为了解决所有问题,我在这里添加了一个代码副本,因为它有点长,所以我将其放在末尾:)

In order to have an all-at-one-place question, I add a copy of the code here, it's a bit long so I put it at the end :)

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;

typedef std::unordered_map<std::string, std::string> dn_key_value_map;

template <typename Iterator>
struct dn_grammar_common : public qi::grammar<Iterator, std::multiset<dn_key_value_map>(), ascii::space_type> {
  struct dn_reserved_chars_ : public qi::symbols<char, char> {
    dn_reserved_chars_() {
      add
        ("\\", "\\")
        ("=" , "=")
        ("+" , "+")
        ("," , ",")
        (";" , ";")
        ("#" , "#")
        ("<" , "<")
        (">" , ">")
        ("\"", "\"")
        ("%" , "%");
    }
  } dn_reserved_chars;
  dn_grammar_common() : dn_grammar_common::base_type(dn) {
    // Useful using directives
    using namespace qi::labels;

    // Low level rules
    // Key can only contain alphanumerical characters and dashes
    key = ascii::no_case[qi::lexeme[(*qi::alnum) >> (*(qi::char_('-') >> qi::alnum))]];
    escaped_hex_char = qi::lexeme[(&qi::char_("\\")) >> qi::repeat(2)[qi::char_("0-9a-fA-F")]];
    escaped_sequence = escaped_hex_char |
                      qi::lexeme[(&qi::char_("\\")) >> dn_reserved_chars];
    // Rule for a fully escaped string (used as Attribute Value) => "..."
    quote_string = qi::lexeme[qi::lit('"') >>
      *(escaped_sequence | (qi::char_ - qi::char_("\\\""))) >>
      qi::lit('"')
    ];
    // Rule for an hexa string (used as Attribute Value) => #23AD5D...
    hex_string = (&qi::char_("#")) >> *qi::lexeme[(qi::repeat(2)[qi::char_("0-9a-fA-F")])];

    // Value is either:
    // - A regular string (that can contain escaped sequences)
    // - A fully escaped string (that can also contain escaped sequences)
    // - An hexadecimal string
    value = (qi::lexeme[*((qi::char_ - dn_reserved_chars) | escaped_sequence)]) |
            quote_string |
            hex_string;

    // Higher level rules
    rdn_pair = key >> '=' >> value;
    // A relative distinguished name consists of a sequence of pairs (Attribute = AttributeValue)
    // Separated with a +
    rdn = rdn_pair % qi::char_("+");
    // The DN is a set of RDNs separated by either a "," or a ";".
    // The two separators can coexist in a given DN, though it is not
    // recommended practice.
    dn = rdn % (qi::char_(",;"));
  }
  qi::rule<Iterator, std::set<dn_key_value_map>(), ascii::space_type> dn;
  qi::rule<Iterator, dn_key_value_map(), ascii::space_type> rdn;
  qi::rule<Iterator, std::pair<std::string, std::string>(), ascii::space_type> rdn_pair;
  qi::rule<Iterator, std::string(), ascii::space_type> key, value, hex_string, quote_string;
  qi::rule<Iterator, std::string(), ascii::space_type> escaped_hex_char, escaped_sequence;
};

推荐答案

我怀疑您只需要 fusion/adapted/std_pair.hpp

让我尝试使其编译

  1. 您的开始规则不兼容

  1. your start rule was incompatible

 qi::rule<Iterator, std::multiset<dn_key_value_map>(), ascii::space_type> dn;

  • 符号表应映射到字符串,而不是char

  • the symbol table should map to string, not char

    struct dn_reserved_chars_ : public qi::symbols<char, std::string> {
    

    ,您应该将映射值更改为char文字.

    or you should change the mapped values to char literals.

    为什么要使用它而不是 char _("\\ = +,;#<> \"%)?

  • 更新

    已经完成了对语法的审查(完全从实现的角度出发,因此我实际上没有阅读RFC来检查假设).

    Update

    Have completed my review of the Grammar (purely from the implementation point-of-view, so I haven't actually read the RFC to check the assumptions).

    我在此处创建了拉取请求: https://github.com/Rerito/pkistore/pull/1

    1. 一般说明

    1. General Notes

    • 无序地图无法排序,因此使用 map< string,string>
    • 从技术上讲,外部集不是RFC中的集(?),因此向量(也使相对域名之间的输出更符合输入顺序)
    • 删除了迷信内容(融合集/地图完全与std :: set/map无关.只需要std_pair.hpp即可使地图正常工作

    语法规则:

    • 符号< char,char> 需要 char 值(不是." ,而是'.')
    • 许多简化

    • symbols<char,char> requires char values (not "." but '.')
    • Many simplifications

    • 删除& char _(...)实例(它们不匹配任何内容,它是只是一个断言)
    • 删除无能的 no_case []
    • 删除了不必要的 lexeme [] 指令;大多数已经实现通过从规则声明中删除船长
    • 完全删除了一些规则声明(规则def并不复杂足以保证产生的间接费用),例如 hex_string
    • 制成的 key 至少需要一个字符(未检查规格).注意如何

    • remove &char_(...) instances (they don't match anything, it's just an assertion)
    • remove impotent no_case[]
    • removed unnecessary lexeme[] directives; most have been realized by removing the skipper from the rule declarations
    • removed some rule declarations at all (the rule def aren't complex enough to warrant the overhead incurred), e.g. hex_string
    • made key require at least one character (not checked the specs). Note how

    key = ascii::no_case[qi::lexeme[(*qi::alnum) >> (*(qi::char_('-') >> qi::alnum))]];
    

    成为

    key = raw[ alnum >> *(alnum | '-') ];
    

    原始表示输入序列将逐字反映(而不是逐个字符地构建副本)

    raw means that the input sequence will be reflected verbatim (instead of building a copy character by character)

    value 上的分支进行了重新排序(未选中,但我下注未定)字符串基本上会吃掉其他所有东西)

    reordered branches on value (not checked, but I wager unqouted strings would basically eat everything else)

    测试

    根据rfc中的示例"部分添加了一个测试程序test.cpp(3.).

    Added a test program test.cpp, based on the Examples section in the rfc (3.).

    添加了一些我自己设计的更复杂的示例.

    Added some more complicated examples of my own devising.

    尾端松动

    要做的事情:查看规范中的实际规则和要求

    To do: review the specs for actual rules and requirements on

    • 转义特殊字符
    • 在各种内部包含空格(包括换行符)串香:

    • escaping special characters
    • inclusion of whitespace (incl. newline characters) inside the various string flavours:

    • 十六进制#xxxx字符串可能允许换行(对我来说很有意义)
    • 未加引号的字符串可能(同义)

    还启用了可选的 BOOST_SPIRIT_DEBUG

    还在语法内部设置了船长(安全!)

    Also made the skipper internal to the grammar (security!)

    还提供了一个方便使用的功能,该功能使解析器可用不会泄漏实施细节(Qi)

    Also made a convenience free function that makes the parser usable without leaking implementation details (Qi)

    实时演示

    在Coliru上直播

    //#include "dn_parser.hpp"
    //#define BOOST_SPIRIT_DEBUG
    #include <boost/fusion/adapted/std_pair.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <map>
    #include <set>
    
    namespace pkistore {
        namespace parsing {
    
        namespace qi      = boost::spirit::qi;
        namespace ascii   = boost::spirit::ascii;
    
        namespace ast {
            typedef std::map<std::string, std::string> rdn;
            typedef std::vector<rdn> dn;
        }
    
        template <typename Iterator>
        struct dn_grammar_common : public qi::grammar<Iterator, ast::dn()> {
            dn_grammar_common() : dn_grammar_common::base_type(start) {
                using namespace qi;
    
                // syntax as defined in rfc1779
                key          = raw[ alnum >> *(alnum | '-') ];
    
                char_escape  = '\\' >> (hexchar | dn_reserved_chars);
                quote_string = '"' >> *(char_escape | (char_ - dn_reserved_chars)) >> '"' ;
    
                value        =  quote_string 
                             | '#' >> *hexchar
                             | *(char_escape | (char_ - dn_reserved_chars))
                             ;
    
                rdn_pair     = key >> '=' >> value;
    
                rdn          = rdn_pair % qi::char_("+");
                dn           = rdn % qi::char_(",;");
    
                start        = skip(qi::ascii::space) [ dn ];
    
                BOOST_SPIRIT_DEBUG_NODES((start)(dn)(rdn)(rdn_pair)(key)(value)(quote_string)(char_escape))
            }
    
        private:
            qi::int_parser<char, 16, 2, 2> hexchar;
    
            qi::rule<Iterator, ast::dn()> start;
    
            qi::rule<Iterator, ast::dn(), ascii::space_type> dn;
            qi::rule<Iterator, ast::rdn(), ascii::space_type> rdn;
            qi::rule<Iterator, std::pair<std::string, std::string>(), ascii::space_type> rdn_pair;
    
            qi::rule<Iterator, std::string()> key, value, quote_string;
            qi::rule<Iterator, char()>        char_escape;
    
            struct dn_reserved_chars_ : public qi::symbols<char, char> {
                dn_reserved_chars_() {
                    add ("\\", '\\') ("\"", '"')
                        ("=" , '=')  ("+" , '+')
                        ("," , ',')  (";" , ';')
                        ("#" , '#')  ("%" , '%')
                        ("<" , '<')  (">" , '>')
                        ;
                }
            } dn_reserved_chars;
        };
    
        } // namespace parsing
    
        static parsing::ast::dn parse(std::string const& input) {
            using It = std::string::const_iterator;
    
            pkistore::parsing::dn_grammar_common<It> const g;
    
            It f = input.begin(), l = input.end();
            pkistore::parsing::ast::dn parsed;
    
            bool ok = boost::spirit::qi::parse(f, l, g, parsed);
    
            if (!ok || (f!=l))
                throw std::runtime_error("dn_parse failure");
    
            return parsed;
        }
    } // namespace pkistore
    
    int main() {
        for (std::string const input : {
                "OU=Sales + CN=J. Smith, O=Widget Inc., C=US",
                "OU=#53616c6573",
                "OU=Sa\\+les + CN=J. Smi\\%th, O=Wid\\,\\;get In\\3bc., C=US",
                //"CN=Marshall T. Rose, O=Dover Beach Consulting, L=Santa Clara,\nST=California, C=US",
                //"CN=FTAM Service, CN=Bells, OU=Computer Science,\nO=University College London, C=GB",
                //"CN=Markus Kuhn, O=University of Erlangen, C=DE",
                //"CN=Steve Kille,\nO=ISODE Consortium,\nC=GB",
                //"CN=Steve Kille ,\n\nO =   ISODE Consortium,\nC=GB",
                //"CN=Steve Kille, O=ISODE Consortium, C=GB\n",
            })
        {
            auto parsed = pkistore::parse(input);
    
            std::cout << "===========\n" << input << "\n";
            for(auto const& dn : parsed) {
                std::cout << "-----------\n";
                for (auto const& kv : dn) {
                    std::cout << "\t" << kv.first << "\t->\t" << kv.second << "\n";
                }
            }
        }
    }
    

    打印:

    ===========
    OU=Sales + CN=J. Smith, O=Widget Inc., C=US
    -----------
        CN  ->  J. Smith
        OU  ->  Sales 
    -----------
        O   ->  Widget Inc.
    -----------
        C   ->  US
    ===========
    OU=#53616c6573
    -----------
        OU  ->  Sales
    ===========
    OU=Sa\+les + CN=J. Smi\%th, O=Wid\,\;get In\3bc., C=US
    -----------
        CN  ->  J. Smi%th
        OU  ->  Sa+les 
    -----------
        O   ->  Wid,;get In;c.
    -----------
        C   ->  US
    

    这篇关于如何使用boost :: spirit将语法解析为std :: set?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆