如何使用Boost Spirit解析CSV之类的转义字符串? [英] How to parse an CSV like escaped String with Boost Spirit?

查看:150
本文介绍了如何使用Boost Spirit解析CSV之类的转义字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的快速解析器项目,我想像转义一样使用CSV:""转义"

For my express parser project i would like to use CSV like escaping: "" to escape "

示例:

 "\"hello\"",
 "   \"  hello \"  ",
 "  \"  hello \"\"stranger\"\" \"  ",

在线编译和尝试: https://wandbox.org/permlink/5uchQM8guIN1k7aR

我当前的解析规则仅解析前两个测试

my current parsing rule only parses the first 2 tests

qi::rule<std::string::const_iterator, qi::blank_type, utree()> double_quoted_string
    = '"' >> qi::no_skip[+~qi::char_('"')] >> '"';

我找到了这个stackoverflow问题,并使用了Spirit给出了一个答案:

i've found this stackoverflow question and one answer is given using spirit:

如何阅读和解析C ++中的CSV文件?

start       = field % ',';
field       = escaped | non_escaped;
escaped     = lexeme['"' >> *( char_ -(char_('"') | ',') | COMMA | DDQUOTE)  >> '"'];
non_escaped = lexeme[       *( char_ -(char_('"') | ',')                  )        ];
DDQUOTE     = lit("\"\"")       [_val = '"'];
COMMA       = lit(",")          [_val = ','];

(我不知道如何链接答案,所以如果有兴趣搜索当您使用boost :: spirit之类​​的精美内容时,您会感到骄傲)

(i don't know how to link answers, so if interesed search for "You gotta feel proud when you use something so beautiful as boost::spirit")

不幸的是,它无法为我编译-甚至多年的C ++错误信息分析也没有为我提供精神错误信息准备:) 如果我理解正确,则规则将等待,作为字符串定界符,这对于我的表达式解析器项目而言可能不是正确的事情

sadly it does not compile for me - and even years of C++ error msg analysis didn't prepared me for spirit error msg floods :) and if i understand it correct the rule will wait for , as a string delimiter, what is maybe not the correct thing for my expression parser project

expression = "strlen( \"hello \"\"you\"\" \" )+1";
expression = "\"hello \"";
expression = "strlen(concat(\"hello\",\"you\")+3";

还是在这种情况下规则需要选择等待,)?

or do the rule need to wait optionally for , and ) in this case?

我希望我不要问太多愚蠢的问题,但答案可以帮助我充分振作起来 表达式解析本身几乎可以正常工作,除了字符串转义

i hope i don't ask too many silly questions but the answers help me alot to get into spirit the expression parse itself is nearly working except string escaping

thx寻求帮助

更新:这似乎对我有用,至少它可以解析字符串 但是从字符串中删除了转义的",是否有更好的调试输出可用于字符串? " " " " "h" "e" "l" "l" "o" " " "s" "t" "r" "a" "n" "g" "e" "r" " "真的不是那么可读

UPDATE: this seems to work for me, at least it parses the strings but removes the escaped " from the string, is there a better debug output available for strings? " " " " "h" "e" "l" "l" "o" " " "s" "t" "r" "a" "n" "g" "e" "r" " " isn't really that readable

qi::rule<std::string::const_iterator, utree()> double_quoted_string
  = qi::lexeme['"' >> *(qi::char_ - (qi::char_('"')) | qi::lit("\"\"")) >> '"'];

推荐答案

您可以简化问题.如何使双引号字符串接受双双引号"以转义嵌入的双引号字符?

You can simplify the question down to this. How to make a double-quoted string accept "double double quotes" to escape an embedded double-quote character?

一个没有转义的简单字符串解析器:

A simple string parser without escapes:

qi::rule<It, std::string()> s = '"' >> *~qi::char_('"') >> '"';

现在,要根据需要也接受单个转义的",只需添加:

Now, to also accept the single escaped " as desired, simply add:

s = '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';

其他说明:

  • 在您的在线示例中,no_skip的使用很草率:它会将"foo bar"" foo bar "解析为foo bar(修剪空白). (再次再次 ).
  • 您的解析器不接受空字符串(这可能是您想要的,但是不确定)
  • 使用utree可能会使您的生活更加复杂化
  • Other notes:

    • in your online example the use of no_skip is sloppy: it would parse "foo bar" and " foo bar " to foo bar (trimming the whitespace).. Instead, drop the skipper from the rule to make it implicitly lexeme (again).
    • Your parser did not accept empty strings (this might be what you want, but that's not certain)
    • using utree is likely complicating your life more than you want
    • 在Coliru上直播

      #define BOOST_SPIRIT_DEBUG
      #include <iostream>
      #include <iomanip>
      #include <string>
      #include <boost/spirit/include/qi.hpp>
      
      namespace qi = boost::spirit::qi;
      namespace fu = boost::fusion;
      
      int main()
      {
          auto tests = std::vector<std::string>{
               R"( "hello" )",
               R"(    "  hello " )",
               R"(  "  hello ""escaped"" "  )",
          };
          for (const std::string& str : tests) {
              auto iter = str.begin(), end = str.end();
      
              qi::rule<std::string::const_iterator, std::string()> double_quoted_string
                  = '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';
      
              std::string ut;
              bool r = qi::phrase_parse(iter, end, double_quoted_string >> qi::eoi, qi::blank, ut);
      
              std::cout << str << " ";
              if (r) {
                  std::cout << "OK: " << std::quoted(ut, '\'') << "\n";
              }
              else {
                  std::cout << "Failed\n";
              }
              if (iter != end) {
                  std::cout << "Remaining unparsed: " << std::quoted(std::string(iter, end)) << "\n";
              }
              std::cout << "----\n";
          }
      }
      

      打印

       "hello"  OK: 'hello'
      ----
          "  hello "  OK: '  hello '
      ----
        "  hello ""escaped"" "   OK: '  hello "escaped" '
      ----
      

      这篇关于如何使用Boost Spirit解析CSV之类的转义字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆