提振精神,在变阶列解析CSV [英] boost spirit parsing CSV with columns in variable order

查看:157
本文介绍了提振精神,在变阶列解析CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用boost精神解析CSV文件(标题行)。
该CSV不是一个常数的格式。有时有一些额外的列或列的顺序进行混合。我感兴趣的几列,其标题名称是众所周知的。

I'm trying to parse a CSV file (with header line) using boost spirit. The csv is not in a constant format. Sometimes there is some extra column or the order of the column is mixed. I'm interested in few columns, whose header name is well known.

例如我的CSV可能看起来像:

For instance my CSV may look like:


Name,Surname,Age
John,Doe,32

或者


Age,Name
32,John

我要来解析的的名称的内容和年龄的(N.B.的年龄的是整数类型)。目前,我拿出一个非常难看解决方案,其中精神分析的第一行,并创建一个包含位置我有兴趣成为一个枚举的向量。然后我必须手工做终端的符号解析...

I want to parse only the content of Name and Age (N.B. Age is integer type). At the moment i come out with a very ugly solution where Spirit parses the first line and creates a vector that contains an enum in the positions i'm interested into. And then i have to do the parsing of the terminal symbols by hand...

enum LineItems {
    NAME, AGE, UNUSED
};

struct CsvLine {
    string name;
    int age;
};

using Column = std::string;
using CsvFile = std::vector<CsvLine>;

template<typename It>
struct CsvGrammar: qi::grammar<It, CsvFile(), qi::locals<std::vector<LineItems>>, qi::blank_type> {
    CsvGrammar() :
            CsvGrammar::base_type(start) {
        using namespace qi;

        static const char colsep = ',';

        start = qi::omit[header[qi::_a = qi::_1]] >> eol >> line(_a) % eol;
        header = (lit("Name")[phx::push_back(phx::ref(qi::_val), LineItems::NAME)]
                | lit("Age")[phx::push_back(phx::ref(qi::_val), LineItems::AGE)]
                | column[phx::push_back(phx::ref(qi::_val), LineItems::UNUSED)]) % colsep;
        line = (column % colsep)[phx::bind(&CsvGrammar<It>::convertFunc, this, qi::_1, qi::_r1,
                qi::_val)];
        column = quoted | *~char_(",\n");
        quoted = '"' >> *("\"\"" | ~char_("\"\n")) >> '"';
    }

    void convertFunc(std::vector<string>& columns, std::vector<LineItems>& positions, CsvLine &csvLine) {
       //terminal symbol parsing here, and assign to csvLine struct.
       ...
    }
private:
    qi::rule<It, CsvFile(), qi::locals<std::vector<LineItems>>, qi::blank_type> start;
    qi::rule<It, std::vector<LineItems>(), qi::blank_type> header;
    qi::rule<It, CsvLine(std::vector<LineItems>), qi::blank_type> line;
    qi::rule<It, Column(), qi::blank_type> column;
    qi::rule<It, std::string()> quoted;
    qi::rule<It, qi::blank_type> empty;

};

下面是完整的源

如果头分析器可以prepare一个矢量&lt;规则&LT; ...&GT; *&GT; 和行分析器只是用这个载体来解析自己?一种先进的 nabialek招(我一直在努力但我不能让它)。

What if the header parser could prepare a vector<rule<...>*> and the "line parser" just use this vector to parse itself? a sort of advanced nabialek trick (i've been trying but i couldn't make it).

或者是还有什么更好的方法来分析这种CSV与精神?
(任何帮助AP preciated,谢谢提前)

Or is there any better way to parse this kind of CSV with Spirit? (any help is appreciated, thank you in advance)

推荐答案

我会去与你有这个概念,

I'd go with the concept that you have,

我想这是很多优雅(齐当地人甚至允许折返利用这一点)。

I think it's plenty elegant (the qi locals even allow reentrant use of this).

要减少规则( 升压精神的克鲁夫特:&QUOT;语义行为是邪恶的&QUOT;? ),您<击>可以移动的转换功能关到属性转换的自定义点

To reduce the cruft in the rules (Boost Spirit: "Semantic actions are evil"?) you could move the "conversion function" off into attribute transformation customization points.

哎呀。至于评论说是太简单了。但是,您仍然可以减少cruftiness颇有几分。有两个简单的调整,语法如下:

Oops. As commented that was too simple. However, you can still reduce the cruftiness quite a bit. With two simple tweaks, the grammar reads:

item.add("Name", NAME)("Age", AGE);
start  = omit[ header[_a=_1] ] >> eol >> line(_a) % eol;

header = (item | omit[column] >> attr(UNUSED)) % colsep;
line   = (column % colsep) [convert];

column = quoted | *~char_(",\n");
quoted = '"' >> *("\"\"" | ~char_("\"\n")) >> '"';

的调整:


  • 使用齐::符号从头部映射到的LineItem

  • 使用原始semantinc动作( [转换] )直接访问上下文(请参阅提振精神语义动作参数

  • using qi::symbols to map from header to LineItem
  • using a raw semantinc action ([convert]) which directly access the context (see boost spirit semantic action parameters):

struct final {
    using Ctx = typename decltype(line)::context_type;

    void operator()(Columns const& columns, Ctx &ctx, bool &pass) const {
        auto& csvLine   = boost::fusion::at_c<0>(ctx.attributes);
        auto& positions = boost::fusion::at_c<1>(ctx.attributes);
        int i =0;

        for (LineItems position : positions) {
            switch (position) {
                case NAME: csvLine.name = columns[i];              break;
                case AGE:  csvLine.age = atoi(columns[i].c_str()); break;
                default:   break;
            }
            i++;
        }

        pass = true; // returning false fails the `line` rule
    }
} convert;


可以说,结果是类似于做自动转换= PHX ::绑定(安培; CsvGrammar&LT;它&GT; :: convertFunc,对此,齐:: _ 1,补气:: _ R1,齐:: _ VAL ),但使用汽车与原/凤凰/灵前pressions是出了名的容易出错(UB由于悬挂裁判来自前临时pression模板),所以我肯定preFER上面显示的方式。

Arguably the upshot is akin to doing auto convert = phx::bind(&CsvGrammar<It>::convertFunc, this, qi::_1, qi::_r1, qi::_val) but using auto with Proto/Phoenix/Spirit expressions is notoriously error prone (UB due to dangling refs to temporaries from the expression template), so I'd certainly prefer the way shown above.

<大骨节病> 住在Coliru

//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <iostream>
#include <boost/fusion/include/at_c.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <vector>

namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;

using std::string;

enum LineItems { NAME, AGE, UNUSED };

struct CsvLine {
    string name;
    int age;
};

using Column  = std::string;
using Columns = std::vector<Column>;
using CsvFile = std::vector<CsvLine>;

template<typename It>
struct CsvGrammar: qi::grammar<It, CsvFile(), qi::locals<std::vector<LineItems>>, qi::blank_type> {
    CsvGrammar() : CsvGrammar::base_type(start) {
        using namespace qi;
        static const char colsep = ',';

        item.add("Name", NAME)("Age", AGE);
        start  = qi::omit[ header[_a=_1] ] >> eol >> line(_a) % eol;

        header = (item | omit[column] >> attr(UNUSED)) % colsep;
        line   = (column % colsep) [convert];

        column = quoted | *~char_(",\n");
        quoted = '"' >> *("\"\"" | ~char_("\"\n")) >> '"';

        BOOST_SPIRIT_DEBUG_NODES((header)(column)(quoted));
    }

private:
    qi::rule<It, std::vector<LineItems>(),                      qi::blank_type> header;
    qi::rule<It, CsvFile(), qi::locals<std::vector<LineItems>>, qi::blank_type> start;
    qi::rule<It, CsvLine(std::vector<LineItems> const&),        qi::blank_type> line;

    qi::rule<It, Column(), qi::blank_type> column;
    qi::rule<It, std::string()> quoted;
    qi::rule<It, qi::blank_type> empty;

    qi::symbols<char, LineItems> item;

    struct final {
        using Ctx = typename decltype(line)::context_type;

        void operator()(Columns const& columns, Ctx &ctx, bool &pass) const {
            auto& csvLine   = boost::fusion::at_c<0>(ctx.attributes);
            auto& positions = boost::fusion::at_c<1>(ctx.attributes);
            int i =0;

            for (LineItems position : positions) {
                switch (position) {
                    case NAME: csvLine.name = columns[i];              break;
                    case AGE:  csvLine.age = atoi(columns[i].c_str()); break;
                    default:   break;
                }
                i++;
            }

            pass = true; // returning false fails the `line` rule
        }
    } convert;
};

int main() {
    const std::string s = "Surname,Name,Age,\nJohn,Doe,32\nMark,Smith,43";

    auto f(begin(s)), l(end(s));
    CsvGrammar<std::string::const_iterator> p;

    CsvFile parsed;
    bool ok = qi::phrase_parse(f, l, p, qi::blank, parsed);

    if (ok) {
        for (CsvLine line : parsed) {
            std::cout << '[' << line.name << ']' << '[' << line.age << ']';
            std::cout << std::endl;
        }
    } else {
        std::cout << "Parse failed\n";
    }

    if (f != l)
        std::cout << "Remaining unparsed: '" << std::string(f, l) << "'\n";
}

打印

[Doe][32]
[Smith][43]

这篇关于提振精神,在变阶列解析CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆