如何使用qi解析和验证整数的有序列表 [英] how to parse and verify an ordered list of integers using qi
问题描述
我正在解析一个文本文件,大小可能为几GB,由以下几行组成:
I'm parsing a text file, possibly several GB in size, consisting of lines as follows:
11 0.1
14 0.78
532 -3.5
基本上,每行一个int和一个float.整数应该有序且非负.我想验证数据是否如所述,并已将范围内的min和max int返回给我.这是我想出的:
Basically, one int and one float per line. The ints should be ordered and non-negative. I'd like to verify the data are as described, and have returned to me the min and max int in the range. This is what I've come up with:
#include <iostream>
#include <string>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/std_pair.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
namespace my_parsers
{
using namespace qi;
using px::at_c;
using px::val;
template <typename Iterator>
struct verify_data : grammar<Iterator, locals<int>, std::pair<int, int>()>
{
verify_data() : verify_data::base_type(section)
{
section
= line(val(0)) [ at_c<0>(_val) = _1]
>> +line(_a) [ _a = _1]
>> eps [ at_c<1>(_val) = _a]
;
line
%= (int_ >> other) [
if_(_r1 >= _1)
[
std::cout << _r1 << " and "
<< _1 << val(" out of order\n")
]
]
;
other
= omit[(lit(' ') | '\t') >> float_ >> eol];
}
rule<Iterator, locals<int>, std::pair<int, int>() > section;
rule<Iterator, int(int)> line;
rule<Iterator> other;
};
}
using namespace std;
int main(int argc, char** argv)
{
string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
my_parsers::verify_data<string::iterator> verifier;
pair<int, int> p;
std::string::iterator begin(input.begin()), end(input.end());
cout << "parse result: " << boolalpha
<< qi::parse(begin, end, verifier, p) << endl;
cout << "p.first: " << p.first << "\np.second: " << p.second << endl;
return 0;
}
我想知道的是以下内容:
What I'd like to know is the following:
- 是否有更好的解决方法?我使用了继承和合成的属性,局部变量和一些凤凰巫毒.这很棒;学习工具是好的,但是我不禁想到可能会有一种更简单的方法来实现相同的目的:/(在PEG解析器中……)
- 例如,如果没有局部变量怎么办?
- Is there a better way of going about this? I have used inherited and synthesised attributes, local variables and a bit of phoenix voodoo. This is great; learning the tools is good but I can't help thinking there might be a much simpler way of achieving the same thing :/ (within a PEG parser that is...)
- How could it be done without the local variable for instance?
更多信息:我同时正在解析其他数据格式,因此我想将返回值保留为解析器属性.目前,这是一个std :: pair,解析后的其他数据格式将公开它们自己的std :: pairs,例如,这些就是我想要填充到std :: vector的东西.
More info: I have other data formats that are being parsed at the same time and so I'd like to keep the return value as a parser attribute. At the moment this is a std::pair, the other data formats when parsed, will expose their own std::pairs for instance and it's these that I'd like to stuff in a std::vector.
推荐答案
这至少要短很多:
- 低至28个LOC
- 没有更多的当地人
- 不再有融合矢量
at<>
向导 - 没有继承的属性
- 没有语法课
- 不再需要手动迭代
- 使用期望点(请参阅
other
)来增强解析错误报告 - 如果您选择将其分配给
%=
,则该解析器表达式可以整齐地合成为vector<int>
(但是,这可能会提高性能,除了可能分配较大的数组之外)
- down to 28 LOC
- no more locals
- no more fusion vector
at<>
wizardry - no more inherited attributes
- no more grammar class
- no more manual iteration
- using expectation points (see
other
) to enhance parse error reporting - this parser expressions synthesizes neatly into a
vector<int>
if you choose to assign it with%=
(but it will cost performance, besides potentially allocating a largish array)
.
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int main(int argc, char** argv)
{
std::string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
int min=-1, max=0;
{
using namespace qi;
using px::val;
using px::ref;
It begin(input.begin()), end(input.end());
rule<It> index = int_
[
if_(ref(max) < _1) [ ref(max) = _1 ] .else_ [ std::cout << _1 << val(" out of order\n") ],
if_(ref(min) < 0) [ ref(min) = _1 ]
] ;
rule<It> other = char_(" \t") > float_ > eol;
std::cout << "parse result: " << std::boolalpha
<< qi::parse(begin, end, index % other) << std::endl;
}
std::cout << "min: " << min << "\nmax: " << max << std::endl;
return 0;
}
奖金
我可能建议从表达式中删除验证,并使其成为独立函数;当然,这会使事情变得更冗长(而且更清晰),而我的死灵样本使用全局变量...-但我相信您知道如何使用boost::bind
或px::bind
使其更真实生活
Bonus
I might suggest taking the validation out of the expression and make it a free-standing function; of course, this makes things more verbose (and... legible) and my braindead sample uses global variables... -- but I trust you know how to use boost::bind
or px::bind
to make it more real-life
除上述内容
- 即使具有免费功能,也可以降低到27个LOC
- 没有更多的凤凰,没有更多的凤凰包括(编译时间)
- 在调试版本中没有更多的phoenix表达式类型使二进制文件膨胀并减慢其速度
- 不再有
var
,ref
,if_
,.else_
和可悲的operator,
(在某些时候存在重大错误风险( )),这是因为没有过载phoenix.hpp中包含) - (轻松移植到c ++ 0x lambda的代码-立即消除了对全局变量的需求)
- down to 27 LOC even with the free function
- no more phoenix, no more phoenix includes (yay compile times)
- no more phoenix expression types in debug builds ballooning the binary and slowing it down
- no more
var
,ref
,if_
,.else_
and the wretchedoperator,
(which had major bug risk (at some time) due to the overload not being included with phoenix.hpp) - (easily ported to c++0x lambda's - immediately removing the need for global variables)
.
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int min=-1, max=0, linenumber=0;
void validate_index(int index)
{
linenumber++;
if (min < 0) min = index;
if (max < index) max = index;
else std::cout << index << " out of order at line " << linenumber << std::endl;
}
int main(int argc, char** argv)
{
std::string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
It begin(input.begin()), end(input.end());
{
using namespace qi;
rule<It> index = int_ [ validate_index ] ;
rule<It> other = char_(" \t") > float_ > eol;
std::cout << "parse result: " << std::boolalpha
<< qi::parse(begin, end, index % other) << std::endl;
}
std::cout << "min: " << min << "\nmax: " << max << std::endl;
return 0;
}
这篇关于如何使用qi解析和验证整数的有序列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!