如何使用boost ::精神解析CSV [英] How to parse csv using boost::spirit

查看:663
本文介绍了如何使用boost ::精神解析CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个CSV行

std::string s = R"(1997,Ford,E350,"ac, abs, moon","some "rusty" parts",3000.00)";

我可以使用的boost ::标记生成器解析

typedef boost::tokenizer< boost::escaped_list_separator<char> , std::string::const_iterator, std::string> Tokenizer;
boost::escaped_list_separator<char> seps('\\', ',', '\"');
Tokenizer tok(s, seps);
for (auto i : tok)
{
    std::cout << i << std::endl;
}

它得到它的权利,除了象征生锈应该有越来越剥离双引号。

It gets it right except token "rusty" should have double quotes which are getting stripped.

下面是我尝试使用boost ::精神

Here is my attempt to use boost::spirit

boost::spirit::classic::rule<> list_csv_item = !(boost::spirit::classic::confix_p('\"', *boost::spirit::classic::c_escape_ch_p, '\"') | boost::spirit::classic::longest_d[boost::spirit::classic::real_p | boost::spirit::classic::int_p]);
std::vector<std::string> vec_item;
std::vector<std::string>  vec_list;
boost::spirit::classic::rule<> list_csv = boost::spirit::classic::list_p(list_csv_item[boost::spirit::classic::push_back_a(vec_item)],',')[boost::spirit::classic::push_back_a(vec_list)];
boost::spirit::classic::parse_info<> result = parse(s.c_str(), list_csv);
if (result.hit)
{
  for (auto i : vec_item)
  {
    cout << i << endl;
   }
}

问题:


  1. 不起作用,打印第一只标记

  1. does not work, prints the first token only

为什么的boost ::精神::经典?无法找到使用精神V2例子

why boost::spirit::classic? can't find examples using Spirit V2

设置是残酷的..但我可以用这个活

the setup is brutal .. but I can live with this

**我真的想用的boost ::精神,因为它往往是pretty快

** I really want to use boost::spirit because it tends to be pretty fast

期望的输出:

1997
Ford
E350
ac, abs, moon
some "rusty" parts

3000.00

3000.00

推荐答案

Sehe的帖子好像比我更清洁的公平一点,但我把这个共同的一点,所以这里反正是:

Sehe's post looks a fair bit cleaner than mine, but I was putting this together for a bit, so here it is anyways:

#include <boost/tokenizer.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
    const std::string s = R"(1997,Ford,E350,"ac, abs, moon",""rusty"",3000.00)";

    // Tokenizer
    typedef boost::tokenizer< boost::escaped_list_separator<char> , std::string::const_iterator, std::string> Tokenizer;
    boost::escaped_list_separator<char> seps('\\', ',', '\"');
    Tokenizer tok(s, seps);
    for (auto i : tok)
        std::cout << i << "\n";
    std::cout << "\n";

    // Boost Spirit Qi
    qi::rule<std::string::const_iterator, std::string()> quoted_string = '"' >> *(qi::char_ - '"') >> '"';
    qi::rule<std::string::const_iterator, std::string()> valid_characters = qi::char_ - '"' - ',';
    qi::rule<std::string::const_iterator, std::string()> item = *(quoted_string | valid_characters );
    qi::rule<std::string::const_iterator, std::vector<std::string>()> csv_parser = item % ',';

    std::string::const_iterator s_begin = s.begin();
    std::string::const_iterator s_end = s.end();
    std::vector<std::string> result;

    bool r = boost::spirit::qi::parse(s_begin, s_end, csv_parser, result);
    assert(r == true);
    assert(s_begin == s_end);

    for (auto i : result)
        std::cout << i << std::endl;
    std::cout << "\n";
}   

和这个输出:

1997
Ford
E350
ac, abs, moon
rusty
3000.00

1997
Ford
E350
ac, abs, moon
rusty
3000.00

东西值得一提的:这不会实现一个完整的CSV解析器。你还希望想看看其他任何需要您实现转义字符或

Something Worth Noting: This doesn't implement a full CSV parser. You'd also want to look into escape characters or whatever else is required for your implementation.

同时:如果您正在寻找进入的文件,只要你知道,在齐,'A'等同于的boost ::精神::气点燃::('A')ABC等同于的boost ::精神::气::亮起(ABC)

Also: If you're looking into the documentation, just so you know, in Qi, 'a' is equivalent to boost::spirit::qi::lit('a') and "abc" is equivalent to boost::spirit::qi::lit("abc").

在双引号:所以,作为Sehe上述评论指出,这不是直接清楚周围的规则输入文本的意思。如果你想要的所有实例不中引用的字符串转换为一个,然后有点像以下是可行的。

On Double quotes: So, as Sehe notes in a comment above, it's not directly clear what the rules surrounding a "" in the input text means. If you wanted all instances of "" not within a quoted string to be converted to a ", then something like the following would work.

qi::rule<std::string::const_iterator, std::string()> double_quote_char = "\"\"" >> qi::attr('"');
qi::rule<std::string::const_iterator, std::string()> item = *(double_quote_char | quoted_string | valid_characters );

这篇关于如何使用boost ::精神解析CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆