读空值与升压::精神 [英] Read empty values with boost::spirit
问题描述
我想读的CSV到一个结构:
结构数据
{
性病::串;
标准::字符串B:
性病::串c;
}
不过,我想读,甚至空字符串,以确保所有值都各得其所。
我适应了结构以一个boost ::融合,所以下面的工作:
//我们的解析器(使用自定义的船长跳过注释和空行)
模板< typename的迭代器,类型名船长= comment_skipper<&迭代器GT; >
结构google_parser:补气::语法<迭代器,地址簿(),船长>
{
google_parser():google_parser :: base_type(联系人,联系人)
{
使用气:: EOL;
使用气:: EPS;
使用气:: _ 1;
使用气:: _ VAL;
使用气::重复;
使用standard_wide :: char_;
使用凤:: at_c;
使用凤:: VAL; 值= *(char_ - ',' - EOL)_val + = _1]。 //这工作,但仅适用于小结构
条目%=价值>> ','>>值>> ','>>值>> EOL;
} 齐::规则<迭代器,标准::字符串()>值;
齐::规则<迭代器,数据()>条目;
};
不幸的是,在矢量重复
存储所有非空值,所以属性的值可以混合在一起(即如果 B场
为空,也可能包含来自 c中的含量
)
条目%=重复(2)值与GT;> ',']≥>值>> EOL;
我想使用类似重复短规则
为我的结构在实际应用中60属性!不仅是写60规则繁琐,但它似乎加速不喜欢长时间规则...
您只是想确保你的解析空字符串太值。
=价值+(char_ - ',' - EOL)| ATTR((未指定));
进入=价值>> ','>>值>> ','>>值>> EOL;
请参阅演示:
<大骨节病> 住在Coliru 骨节病>
//#定义BOOST_SPIRIT_DEBUG
#包括LT&;升压/融合/调整/ struct.hpp&GT;
#包括LT&;升压/精神/有/ qi.hpp&GT;命名空间补气=的boost ::精神::补气;结构数据{
性病::串;
标准::字符串B:
性病::串c;
};BOOST_FUSION_ADAPT_STRUCT(数据,(标准::字符串,一)(标准::字符串,B)(标准::字符串,C))模板&LT; typename的迭代器,类型名船长=补气:: blank_type&GT;
结构google_parser:补气::语法&LT;迭代器,数据(),船长&GT; {
google_parser():google_parser :: base_type(入门,人脉){
使用命名空间补气; 值= +(char_ - ',' - EOL)| ATTR((未指定));
进入=价值&GT;&GT; ','&GT;&GT;值&GT;&GT; ','&GT;&GT;值&GT;&GT; EOL; BOOST_SPIRIT_DEBUG_NODES((值)(输入))
}
私人的:
齐::规则&LT;迭代器,标准::字符串()&GT;值;
齐::规则&LT;迭代器,数据(),船长&GT;条目;
};诠释主(){
使用它=标准::字符串::为const_iterator;
google_parser&LT;它&GT;磷; 对于(标准::字符串输入:{
什么的,太可怕了,为\\ n,
精,,只是\\ n
像缺点什么:,, \\ n
})
{
它F = input.begin(),L = input.end(); 数据分析;
布尔OK =齐:: phrase_parse(F,L,P,补气::空白,解析); 如果(OK)
性病::法院LT&;&LT; 经分析:&LT;&LT; parsed.a&LT;&LT; ','&LT;&LT; parsed.b&LT;&LT; ','&LT;&LT; parsed.c&LT;&LT; '\\ n;
其他
性病::法院LT&;&LT; 解析失败\\ n; 如果(F!= 1)
性病::法院LT&;&LT; 剩余未解析:'&LT;&LT;标准::字符串(F,L)LT;&LT; '\\ n;
}
}
打印:
解析的:'东西','可怕','是'
解析:精,(未指定)','只是'
解析:像缺了点什么:','(不明),(未指定)
不过,你有一个更大的问题。该齐::重复的假设(2)[值]
将解析分成2个字符串不起作用。
重复
,如运算符*
,运营商+
和操作符%
解析到一个容器属性。在这种情况下,容器属性(字符串)将接收来自第二值
输入,以及:
<大骨节病> 住在Coliru 骨节病>
解析的:somethingawful','是',''
解析:'精(未指定)','刚',''
解析:像缺了点什么:(未指定),(未指定)',''
由于这是不是你想要的,考虑你的数据类型:
-
要么不调整结构,而是写一个定制特性分配领域(见的http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/advanced/customize.html)
-
改变结构包含的std ::字符串的向量相匹配的公开的属性
-
或创建一个自动分析器生成:
的汽车_
办法:
如果你教齐如何提取单个值,你可以使用像
一个简单的规则 项=跳过(队长()|',')[AUTO_&GT;&GT; EOL;
这样,精神本身会产生对给定的顺序融合正确数量的价值提取的!
下面是一个快速的肮脏的方法:
CAVEAT 专业为
的std ::字符串
直接像这可能不是最好的主意(它可能并不一定合适,而且可能严重相互作用与其他解析器)。然而,在默认情况下create_parser&LT;标准::字符串&GT;
未定义(因为,它会做什么?),所以我抓住了这个演示的目的的机会:
块引用>空间boost {空间{精神特质空间{
模板&LT;&GT;结构create_parser&LT;标准::字符串&GT; {
原的typedef :: ::的result_of - DEEP_COPY LT;
BOOST_TYPEOF(
齐::语义[+(气:: char_ - , - 齐:: EOL)|齐:: ATTR((未指定))
)
&GT; ::类型类型; 静态类型调用(){
返回原:: DEEP_COPY(
齐::语义[+(气:: char_ - , - 齐:: EOL)|齐:: ATTR((未指定))
);
}
};
}}}再次看到演示输出:
<大骨节病> 住在Coliru 骨节病>
解析的:'东西','可怕','是'
解析:'精','刚','(未指定)
解析:像缺了点什么:','(不明),(未指定)
注意有一些高级的巫术得到队长的工作刚刚好(见
跳过()[]
和语义[]
)。一些一般性的解释可以在这里找到:<一href=\"http://stackoverflow.com/questions/17072987/boost-spirit-skipper-issues/17073965#17073965\">Boost精神队长问题
块引用>更新
容器方法
有一个微妙了这一点。两人竟。所以这里有一个演示:
<大骨节病> 住在Coliru 骨节病>
//#定义BOOST_SPIRIT_DEBUG
#包括LT&;升压/融合/调整/ struct.hpp&GT;
#包括LT&;升压/精神/有/ qi.hpp&GT;命名空间补气=的boost ::精神::补气;结构数据{
的std ::矢量&lt;标准::字符串&GT;部分;
};BOOST_FUSION_ADAPT_STRUCT(数据,(性病::矢量&lt;标准::字符串&gt;中部分))模板&LT; typename的迭代器,类型名船长=补气:: blank_type&GT;
结构google_parser:补气::语法&LT;迭代器,数据(),船长&GT; {
google_parser():google_parser :: base_type(入门,人脉){
使用命名空间补气;
齐::为&lt;的std ::矢量&lt;标准::字符串&GT; &GT;串; 值= +(char_ - ',' - EOL)| ATTR((未指定));
进入=字符串[重复(2)值与GT;&GT; ',']≥&GT;值&GT;&GT; EOL; BOOST_SPIRIT_DEBUG_NODES((值)(输入))
}
私人的:
齐::规则&LT;迭代器,标准::字符串()&GT;值;
齐::规则&LT;迭代器,数据(),船长&GT;条目;
};诠释主(){
使用它=标准::字符串::为const_iterator;
google_parser&LT;它&GT;磷; 对于(标准::字符串输入:{
什么的,太可怕了,为\\ n,
精,,只是\\ n
像缺点什么:,, \\ n
})
{
它F = input.begin(),L = input.end(); 数据分析;
布尔OK =齐:: phrase_parse(F,L,P,补气::空白,解析); 如果(OK){
性病::法院LT&;&LT; 经分析:
为(自动&安培;部分:parsed.parts)
性病::法院LT&;&LT; '&所述;&下;部分&LT;&LT; ';
性病::法院LT&;&LT; \\ n;
}
其他
性病::法院LT&;&LT; 解析失败\\ n; 如果(F!= 1)
性病::法院LT&;&LT; 剩余未解析:'&LT;&LT;标准::字符串(F,L)LT;&LT; '\\ n;
}
}细微之处是:
- 适应的单元素序列击中边例自动属性处理:<一href=\"http://stackoverflow.com/questions/19823413/spirit-qi-attribute-propagation-issue-with-single-member-struct/19824426#19824426\">Spirit齐属性传播问题与单个成员的结构
- 精神需要手把手在这种特殊情况下对待
重复[...]&GT;&GT;值
作为合成一个容器/原子/。在 <一个href=\"http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/qi/reference/directive/as.html\"相对=nofollow>为&lt; T&GT;
指令解决了这里I want to read a CSV into a struct :
struct data { std::string a; std::string b; std::string c; }
However, I want to read even empty string to ensure all values are in their proper place. I adapted the struct to a boost::fusion, so the following works :
// Our parser (using a custom skipper to skip comments and empty lines ) template <typename Iterator, typename skipper = comment_skipper<Iterator> > struct google_parser : qi::grammar<Iterator, addressbook(), skipper> { google_parser() : google_parser::base_type(contacts, "contacts") { using qi::eol; using qi::eps; using qi::_1; using qi::_val; using qi::repeat; using standard_wide::char_; using phoenix::at_c; using phoenix::val; value = *(char_ - ',' - eol) [_val += _1]; // This works but only for small structs entry %= value >> ',' >> value >> ',' >> value >> eol; } qi::rule<Iterator, std::string()> value; qi::rule<Iterator, data()> entry; };
Unfortunately,
repeat
stores in a vector all non-empty values so the values of attributes may be mixed together (i.e if the field forb
is null, it may contains the content fromc
):entry %= repeat(2)[ value >> ','] >> value >> eol;
I would like to use a short rule similar to
repeat
as my struct has 60 attributes in practice ! Not only is writing 60 rules tedious but it seems Boost does not like long rules...解决方案You just want to make sure you parse a value for "empty" strings too.
value = +(char_ - ',' - eol) | attr("(unspecified)"); entry = value >> ',' >> value >> ',' >> value >> eol;
See the demo:
//#define BOOST_SPIRIT_DEBUG #include <boost/fusion/adapted/struct.hpp> #include <boost/spirit/include/qi.hpp> namespace qi = boost::spirit::qi; struct data { std::string a; std::string b; std::string c; }; BOOST_FUSION_ADAPT_STRUCT(data, (std::string, a)(std::string, b)(std::string, c)) template <typename Iterator, typename skipper = qi::blank_type> struct google_parser : qi::grammar<Iterator, data(), skipper> { google_parser() : google_parser::base_type(entry, "contacts") { using namespace qi; value = +(char_ - ',' - eol) | attr("(unspecified)"); entry = value >> ',' >> value >> ',' >> value >> eol; BOOST_SPIRIT_DEBUG_NODES((value)(entry)) } private: qi::rule<Iterator, std::string()> value; qi::rule<Iterator, data(), skipper> entry; }; int main() { using It = std::string::const_iterator; google_parser<It> p; for (std::string input : { "something, awful, is\n", "fine,,just\n", "like something missing: ,,\n", }) { It f = input.begin(), l = input.end(); data parsed; bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed); if (ok) std::cout << "Parsed: '" << parsed.a << "', '" << parsed.b << "', '" << parsed.c << "'\n"; else std::cout << "Parse failed\n"; if (f!=l) std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n"; } }
Prints:
Parsed: 'something', 'awful', 'is' Parsed: 'fine', '(unspecified)', 'just' Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'
However, you have a bigger problem. The assumption that
qi::repeat(2) [ value ]
will parse into 2 strings doesn't work.
repeat
, likeoperator*
,operator+
andoperator%
parse into a container attribute. In this case the container attribute (string) will receive the input from the secondvalue
as well:Parsed: 'somethingawful', 'is', '' Parsed: 'fine(unspecified)', 'just', '' Parsed: 'like something missing: (unspecified)', '(unspecified)', ''
Since this is not what you want, reconsider your data types:
either don't adapt the struct but instead write a customization trait to assign the fields (see http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/advanced/customize.html)
change the struct to contain a vector of std::string to match the exposed attributes
or create an auto-parser generator:
The
auto_
approach:If you teach Qi how to extract a single value, you can use a simple rule like
entry = skip(skipper() | ',') [auto_] >> eol;
This way, Spirit itself will generate the correct number of value extractions for the given Fusion sequence!
Here's a quick an dirty approach:
CAVEAT Specializing for
std::string
directly like this might not be the best idea (it might not always be appropriate and might interact badly with other parsers). However, by defaultcreate_parser<std::string>
is not defined (because, what would it do?) so I seized the opportunity for the purpose of this demonstration:
namespace boost { namespace spirit { namespace traits { template <> struct create_parser<std::string> { typedef proto::result_of::deep_copy< BOOST_TYPEOF( qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)") ) >::type type; static type call() { return proto::deep_copy( qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)") ); } }; }}}
Again, see the demo output:
Parsed: 'something', 'awful', 'is' Parsed: 'fine', 'just', '(unspecified)' Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'
NOTE There was some advanced sorcery to get the skipper to work "just right" (see
skip()[]
andlexeme[]
). Some general explanations can be found here: Boost spirit skipper issues
UPDATE
The Container Approach
There's a subtlety to that. Two actually. So here's a demo:
//#define BOOST_SPIRIT_DEBUG #include <boost/fusion/adapted/struct.hpp> #include <boost/spirit/include/qi.hpp> namespace qi = boost::spirit::qi; struct data { std::vector<std::string> parts; }; BOOST_FUSION_ADAPT_STRUCT(data, (std::vector<std::string>, parts)) template <typename Iterator, typename skipper = qi::blank_type> struct google_parser : qi::grammar<Iterator, data(), skipper> { google_parser() : google_parser::base_type(entry, "contacts") { using namespace qi; qi::as<std::vector<std::string> > strings; value = +(char_ - ',' - eol) | attr("(unspecified)"); entry = strings [ repeat(2) [ value >> ',' ] >> value ] >> eol; BOOST_SPIRIT_DEBUG_NODES((value)(entry)) } private: qi::rule<Iterator, std::string()> value; qi::rule<Iterator, data(), skipper> entry; }; int main() { using It = std::string::const_iterator; google_parser<It> p; for (std::string input : { "something, awful, is\n", "fine,,just\n", "like something missing: ,,\n", }) { It f = input.begin(), l = input.end(); data parsed; bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed); if (ok) { std::cout << "Parsed: "; for (auto& part : parsed.parts) std::cout << "'" << part << "' "; std::cout << "\n"; } else std::cout << "Parse failed\n"; if (f!=l) std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n"; } }
The subtleties are:
- adapting a single-element sequence hits edge cases with automatic attribute handling: Spirit Qi attribute propagation issue with single-member struct
- Spirit needs hand-holding in this particular case to treat the
repeat[...]>>value
as synthesizing a single container /atomically/. Theas<T>
directive solves that here这篇关于读空值与升压::精神的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!