使用 boost xpressive 降低性能 [英] Slow performance using boost xpressive

查看:17
本文介绍了使用 boost xpressive 降低性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我一直在使用 boost xpressive 来解析文件.这些文件每个 10 MB,将有数百个文件需要解析.

Xpressive 很好用,语法清晰,但问题在于性能.它在调试版本中的爬行方式令人难以置信,而在发布版本中,它每个文件花费的时间超过一整秒.我已经针对旧的普通 get_line()、find() 和 sscanf() 代码进行了测试,它可以轻松击败 xpressive.

我知道类型检查、回溯等是有成本的,但这对我来说似乎太过分了.我怎么想知道,我做错了什么?有什么方法可以优化它以适当的速度运行吗?将代码迁移到 boost::spirit 是否值得付出努力?

我准备了一个精简版的代码,其中嵌入了几行真实文件,以防有人进行测试和提供帮助.

注意- 作为一项要求,必须使用 VS 2010(不幸的是不完全符合 c++11)

#include #include const char input[] = "[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: BTN-1002 - Km: 90.0 - SWITCH_ON: 1
[2018 年 3 月 13 日 13:13:59.580482] - 0.200 秒 =>驾驶员:0 - 速度:0.0 - 道路:A-11 - 公里:90.0 - 坡度:0
[2018 年 3 月 13 日 13:14:01.170203] - 1.790 秒 =>驾驶员:0 - 速度:0.0 - 道路:A-11 - 公里:90.0 - 齿轮:0
[2018 年 3 月 13 日 13:14:01.170203] - 1.790 秒 =>驾驶员:0 - 速度:0.1 - 道路:A-11 - 公里:90.0 - 齿轮:1
[2018 年 3 月 13 日 13:14:01.819966] - 2.440 秒 =>司机:0 - 速度:0.1 - 道路:A-11 - 公里:90.0 - 序列:1
[2018 年 3 月 13 日 13:14:01.819966] - 2.440 秒 =>驾驶员:0 - 速度:0.2 - 道路:A-11 - 公里:90.0 - 离合器:1
[2018 年 3 月 13 日 13:14:01.819966] - 2.540 秒 =>备份到注册表
[2018 年 3 月 13 日 13:14:02.409855] - 3.030 秒 =>司机:0 - 速度:0.2 - 道路:A-11 - 公里:90.0 - 序列:4
[2018 年 3 月 13 日 13:14:02.409855] - 3.030 秒 =>司机:0 - 速度:0.3 - 道路:A-11 - 公里:90.0 - 序列:8
[2018 年 3 月 13 日 13:14:01.819966] - 3.110 秒 =>备份到注册表
[2018 年 3 月 13 日 13:14:02.620424] - 3.240 秒 =>司机:0 - 速度:0.4 - 道路:A-11 - 公里:90.1 - 序列:15
[2018 年 3 月 13 日 13:14:02.829983] - 3.450 秒 =>驾驶员:0 - 速度:0.6 - 道路:B-302 - 公里:90.1 - 坡度:-5
[2018 年 3 月 13 日 13:14:03.039600] - 3.660 秒 =>司机:0 - 速度:0.8 - 道路:B-302 - 公里:90.1 - 序列:21
[2018 年 3 月 13 日 13:14:03.250451] - 3.870 秒 =>驾驶员:0 - 速度:1.2 - 道路:B-302 - 公里:90.2 - 齿轮:2
[2018 年 3 月 13 日 13:14:03.460012] - 4.080 秒 =>司机:0 - 速度:1.7 - 道路:B-302 - 公里:90.3 - 序列:29
[2018 年 3 月 13 日 13:14:03.669448] - 4.290 秒 =>司机:0 - 速度:2.2 - 道路:B-302 - 公里:90.4 - 序列:34
[2018 年 3 月 13 日 13:14:03.880066] - 4.500 秒 =>司机:0 - 速度:2.8 - 公路:B-302 - 公里:90.5 - 离合器:1
[2018 年 3 月 13 日 13:14:04.090444] - 4.710 秒 =>司机:0 - 速度:3.5 - 道路:B-302 - 公里:90.7 - 序列:45
[2018 年 3 月 13 日 13:14:04.300160] - 4.920 秒 =>司机:0 - 速度:4.2 - 公路:B-302 - 公里:90.9 - 坡度:10
[2018 年 3 月 13 日 13:14:04.510025] - 5.130 秒 =>驾驶员:0 - 速度:4.9 - 公路:B-302 - 公里:91.1 - 齿轮:3";const auto len = std::distance(std::begin(input), std::end(input));结构体序列{国际毫秒;内部驱动程序;整数序列;双倍时间;双绒;双公里;std::string 日期;std::string 路;};命名空间 xp = boost::xpressive;int main(){序列数据;std::vector<序列>序列;使用命名空间 xp;cregex real = (+_d >> '.' >> +_d);cregex 关键字 = " - SEQUENCE: " >>(+_d)[xp::ref(data.sequence) = as<int>(_)];cregex 日期=重复 4(_d)>>'-' >>重复 3 (α) >'-' >>重复 2 (_d) >>_s >>重复 2 (_d) >>':' >>重复 2 (_d) >>':' >>重复 2 (_d);cregex 标头 = '[' >>日期[xp::ref(data.date) = _] >>'.'>>(+_d)[xp::ref(data.ms) = as<int>(_)] >>"]——">>real[xp::ref(data.time) = as<double>(_)]>>" s => 驱动程序:" >>(+_d)[xp::ref(data.driver) = as<int>(_)]>>" - 速度:>>real[xp::ref(data.vel) = as<double>(_)]>>" - 道路:>>(+set[alnum | '-'])[xp::ref(data.road) = _]>>" - 公里:>>real[xp::ref(data.km) = as<double>(_)];xp::cregex 解析器 = (header >> 关键字 >> _ln);xp::cregex_iterator cur(input, input + len, parser);xp::cregex_iterator 结束;for (; cur != end; ++cur)序列.emplace_back(数据);返回0;}

请注意 VS 2010 的限制.

解决方案

我认为大致有两个需要改进的地方:

  • 您基本上解析了所有行,包括您不感兴趣的行
  • 你分配了很多字符串

我建议使用字符串视图来修复分配.接下来,您可以尝试避免解析与​​ SEQUENCE 模式不匹配的行.原则上没有理由使用 Boost Xpressive 不能做到这一点,但我选择的武器恰好是 Boost Spirit,所以我也会包括它.

有选择性

在花费更多精力之前,您可以检测到有趣的线条:

cregex 签名 = -*~_n >>" - 序列:>>(+_d) >>之前(_ln|eos);for (xp::cregex_iterator cur(b, e, signature), end; cur != end; ++cur) {std::cout <<'"<<cur->str()<<"'
";}

这个打印

'[2018-Mar-13 13:14:01.819966] - 2.440 s =>驾驶员:0 - 速度:0.1 - 道路:A-11 - 公里:90.0 - 序列:1''[2018 年 3 月 13 日 13:14:02.409855] - 3.030 秒 =>驾驶员:0 - 速度:0.2 - 道路:A-11 - 公里:90.0 - 序列:4''[2018 年 3 月 13 日 13:14:02.409855] - 3.030 秒 =>驾驶员:0 - 速度:0.3 - 道路:A-11 - 公里:90.0 - 序列:8''[2018-Mar-13 13:14:02.620424] - 3.240 s =>驾驶员:0 - 速度:0.4 - 道路:A-11 - 公里:90.1 - 序列:15''[2018-Mar-13 13:14:03.039600] - 3.660 s =>驾驶员:0 - 速度:0.8 - 道路:B-302 - 公里:90.1 - 序列:21''[2018-Mar-13 13:14:03.460012] - 4.080 s =>驾驶员:0 - 速度:1.7 - 道路:B-302 - 公里:90.3 - 序列:29''[2018 年 3 月 13 日 13:14:03.669448] - 4.290 秒 =>驾驶员:0 - 速度:2.2 - 道路:B-302 - 公里:90.4 - 序列:34''[2018-Mar-13 13:14:04.090444] - 4.710 s =>驾驶员:0 - 速度:3.5 - 道路:B-302 - 公里:90.7 - 序列:45'

没有分配任何东西.这应该很快.

减少分配

为此,我将切换到 Spirit,因为它会让事情变得更容易.

<块引用>

注意:我在这里切换的真正原因是,与 Boost Spirit 相比,Xpressive 似乎没有可扩展的属性传播特性.这可能是我缺乏这方面的经验.

替代方法几乎肯定会用手动传播代码替换操作,这反过来会通知命名的捕获组以保持清晰易读.我不确定这些的性能开销,所以我们现在不要使用它们.

您可以使用带有特征的 boost::string_view 来教" Qi 为其分配文本:

namespace boost { 命名空间精神 { 命名空间特征 {模板 structassign_to_attribute_from_iterators{static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) };}};} } }

那样的话,Qi 语法可能看起来像这样:

template struct QiParser : qi::grammar{QiParser() : QiParser::base_type(line) {使用命名空间qi;自动日期时间 = 复制(重复(4)[数字]>>'-' >>重复(3)α>>'-' >>重复(2)[数字]>>' ' >>重复(2)[数字]>>':' >>重复(2)[数字]>>':' >>重复(2)[数字]>>'.'>>+数字);行 = '[' >>原始[日期时间] >>"]——">>double_>>"s">>" => 驱动程序:" >>整数_>>" - 速度:>>双倍的_>>" - 道路:>>原始[+图形]>>" - 公里:>>双倍的_>>" - 序列:>>整数_>>(eol|eoi);}私人的:qi::rule线;};

使用它非常简单,尤其是在没有选择性"的情况下.

<块引用>

这恰好是获胜"配置.这是删除所有与基准相关的泛型和选项后该算法的独立简化版本:

与 Spirit 相比,我最初也是从选择性方法开始的(完全预计它会更快).以下是不太令人鼓舞的结果:

糟糕.最初的 Xpressive 方法仍然更胜一筹!

调整假设

好的,显然先进行浅扫描,然后完整解析"会影响性能.从理论上讲,这可能归结为缓存/预取效果.此外,线性方法可能会胜出,因为当一行不以 '[' 字符开头时更容易发现,而不是查看它是否以 '[' 结尾>SEQUENCE 模式.

所以我决定将精神方法也适应线性模式,看看通过减少分配的胜利是否仍然值得:

现在我们得到了结果.让我们详细看看 std::stringboost::string_view 方法之间的区别:

总结/结论

减少的分配有助于提高 30% 的效率.与原始方法相比,总共改进了 10 倍.

请注意,基准代码竭尽全力消除实现之间的不公平差异(例如,通过在 Spirit 和 Xpressive 上预编译所有内容).查看完整的基准代码:

<块引用>

孤立的获胜实施:在 Coliru 上直播

#include #include #include #include #include //力量使用它 = char const*;结构序列{内部驱动程序;整数序列;双倍时间;双绒;双公里;boost::string_view 日期;boost::string_view 路;};BOOST_FUSION_ADAPT_STRUCT(::Sequence, date, time, driver, vel, road, km, sequence)命名空间 qi = boost::spirit::qi;命名空间提升 { 命名空间精神 { 命名空间特征 {模板 structassign_to_attribute_from_iterators{static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) };}};} } }std::vector<序列>parse_spirit(It b, It e) {qi::rule静态常量行 = []{使用命名空间qi;自动日期时间 = 复制(重复(4)[数字]>>'-' >>重复(3)α>>'-' >>重复(2)[数字]>>' ' >>重复(2)[数字]>>':' >>重复(2)[数字]>>':' >>重复(2)[数字]>>'.'>>+数字);qi::ruler = '[' >>原始[日期时间] >>"]——">>double_>>"s">>" => 驱动程序:" >>整数_>>" - 速度:>>双倍的_>>" - 道路:>>原始[+图形]>>" - 公里:>>双倍的_>>" - 序列:>>整数_>>(eol|eoi);返回 r;}();std::vector<序列>序列;解析(b, e, *boost::spirit::repository::qi::seek[line], 序列);返回序列;}静态字符输入[] =/*... 见问题...*/;静态常量 size_t len = strlen(input);int main() {自动序列 = parse_spirit(input, input+len);std::cout <<解析:" <<序列大小()<<" 序列行
";}

完整的基准代码

基准测试使用 Nonius 进行测量和统计分析.

#include //力量静态字符输入[] =[2018-Mar-13 13:13:59.580482] - 0.200 s => 司机:0 - 速度:0.0 - 道路:A-11 - 公里:90.0 - 坡度:0
[2018 年 3 月 13 日 13:14:01.170203] - 1.790 秒 =>驾驶员:0 - 速度:0.0 - 道路:A-11 - 公里:90.0 - 齿轮:0
[2018 年 3 月 13 日 13:14:01.170203] - 1.790 秒 =>驾驶员:0 - 速度:0.1 - 道路:A-11 - 公里:90.0 - 齿轮:1
[2018 年 3 月 13 日 13:14:01.819966] - 2.440 秒 =>司机:0 - 速度:0.1 - 道路:A-11 - 公里:90.0 - 序列:1
[2018 年 3 月 13 日 13:14:01.819966] - 2.440 秒 =>驾驶员:0 - 速度:0.2 - 道路:A-11 - 公里:90.0 - 离合器:1
[2018 年 3 月 13 日 13:14:01.819966] - 2.540 秒 =>备份到注册表
[2018 年 3 月 13 日 13:14:02.409855] - 3.030 秒 =>司机:0 - 速度:0.2 - 道路:A-11 - 公里:90.0 - 序列:4
[2018 年 3 月 13 日 13:14:02.409855] - 3.030 秒 =>司机:0 - 速度:0.3 - 道路:A-11 - 公里:90.0 - 序列:8
[2018 年 3 月 13 日 13:14:01.819966] - 3.110 秒 =>备份到注册表
[2018 年 3 月 13 日 13:14:02.620424] - 3.240 秒 =>司机:0 - 速度:0.4 - 道路:A-11 - 公里:90.1 - 序列:15
[2018 年 3 月 13 日 13:14:02.829983] - 3.450 秒 =>驾驶员:0 - 速度:0.6 - 道路:B-302 - 公里:90.1 - 坡度:-5
[2018 年 3 月 13 日 13:14:03.039600] - 3.660 秒 =>司机:0 - 速度:0.8 - 道路:B-302 - 公里:90.1 - 序列:21
[2018 年 3 月 13 日 13:14:03.250451] - 3.870 秒 =>驾驶员:0 - 速度:1.2 - 道路:B-302 - 公里:90.2 - 齿轮:2
[2018 年 3 月 13 日 13:14:03.460012] - 4.080 秒 =>司机:0 - 速度:1.7 - 道路:B-302 - 公里:90.3 - 序列:29
[2018 年 3 月 13 日 13:14:03.669448] - 4.290 秒 =>司机:0 - 速度:2.2 - 道路:B-302 - 公里:90.4 - 序列:34
[2018 年 3 月 13 日 13:14:03.880066] - 4.500 秒 =>司机:0 - 速度:2.8 - 公路:B-302 - 公里:90.5 - 离合器:1
[2018 年 3 月 13 日 13:14:04.090444] - 4.710 秒 =>司机:0 - 速度:3.5 - 道路:B-302 - 公里:90.7 - 序列:45
[2018 年 3 月 13 日 13:14:04.300160] - 4.920 秒 =>司机:0 - 速度:4.2 - 公路:B-302 - 公里:90.9 - 坡度:10
[2018 年 3 月 13 日 13:13:59.580482] - 0.200 秒 =>驾驶员:0 - 速度:0.0 - 道路:A-11 - 公里:90.0 - 坡度:0
[2018 年 3 月 13 日 13:14:01.170203] - 1.790 秒 =>驾驶员:0 - 速度:0.0 - 道路:A-11 - 公里:90.0 - 齿轮:0
[2018 年 3 月 13 日 13:14:01.170203] - 1.790 秒 =>驾驶员:0 - 速度:0.1 - 道路:A-11 - 公里:90.0 - 齿轮:1
[2018 年 3 月 13 日 13:14:01.819966] - 2.440 秒 =>司机:0 - 速度:0.1 - 道路:A-11 - 公里:90.0 - 序列:1
[2018 年 3 月 13 日 13:14:01.819966] - 2.440 秒 =>驾驶员:0 - 速度:0.2 - 道路:A-11 - 公里:90.0 - 离合器:1
[2018 年 3 月 13 日 13:14:01.819966] - 2.540 秒 =>备份到注册表
[2018 年 3 月 13 日 13:14:02.409855] - 3.030 秒 =>司机:0 - 速度:0.2 - 道路:A-11 - 公里:90.0 - 序列:4
[2018 年 3 月 13 日 13:14:02.409855] - 3.030 秒 =>司机:0 - 速度:0.3 - 道路:A-11 - 公里:90.0 - 序列:8
[2018 年 3 月 13 日 13:14:01.819966] - 3.110 秒 =>备份到注册表
[2018 年 3 月 13 日 13:14:02.620424] - 3.240 秒 =>司机:0 - 速度:0.4 - 道路:A-11 - 公里:90.1 - 序列:15
[2018 年 3 月 13 日 13:14:02.829983] - 3.450 秒 =>驾驶员:0 - 速度:0.6 - 道路:B-302 - 公里:90.1 - 坡度:-5
[2018 年 3 月 13 日 13:14:03.039600] - 3.660 秒 =>司机:0 - 速度:0.8 - 道路:B-302 - 公里:90.1 - 序列:21
[2018 年 3 月 13 日 13:14:03.250451] - 3.870 秒 =>驾驶员:0 - 速度:1.2 - 道路:B-302 - 公里:90.2 - 齿轮:2
[2018 年 3 月 13 日 13:14:03.460012] - 4.080 秒 =>司机:0 - 速度:1.7 - 道路:B-302 - 公里:90.3 - 序列:29
[2018 年 3 月 13 日 13:14:03.669448] - 4.290 秒 =>司机:0 - 速度:2.2 - 道路:B-302 - 公里:90.4 - 序列:34
[2018 年 3 月 13 日 13:14:03.880066] - 4.500 秒 =>司机:0 - 速度:2.8 - 公路:B-302 - 公里:90.5 - 离合器:1
[2018 年 3 月 13 日 13:14:04.090444] - 4.710 秒 =>司机:0 - 速度:3.5 - 道路:B-302 - 公里:90.7 - 序列:45
[2018 年 3 月 13 日 13:14:04.300160] - 4.920 秒 =>司机:0 - 速度:4.2 - 公路:B-302 - 公里:90.9 - 坡度:10
[2018 年 3 月 13 日 13:14:04.510025] - 5.130 秒 =>驾驶员:0 - 速度:4.9 - 公路:B-302 - 公里:91.1 - 齿轮:3";静态常量 size_t len = strlen(input);#include #include 模板结构序列{内部驱动程序;整数序列;双倍时间;双绒;双公里;字符串日期;弦路;};BOOST_FUSION_ADAPT_TPL_STRUCT((T),(Sequence)(T), date, time, driver, vel, road, km, sequence)//声明测试中的实现:使用它 = char const*;模板 std::vectorparse_xpressive_linear(It b, It e);模板 std::vectorparse_xpressive_selective(It b, It e);模板 std::vectorparse_spirit_linear(It b, It e);模板 std::vectorparse_spirit_selective(It b, It e);#ifdef VERIFY_OUTPUT#include 使用 boost::fusion::operator<<;#include #define VERIFY() 做 {                                                                            std::cout <<L:"<<__LINE__ <<" 解析:"<<序列大小()<<"
";for (auto r : 序列) { std::cout <<r<<"
";} } 而 (0)#别的#define VERIFY() do { } while (0)#万一#ifdef USE_NONIUS#include <nonius/benchmark.h++>#define NONIUS_RUNNER#include #别的//模拟 nonius命名空间 nonius {结构计时器{模板 静态内联void measure(F&& f) { std::forward(f)();}};静态 std::vector<std::function<void(chronometer)>>s_benchmarks;#define TOKENPASTE(x, y) x ## y#define TOKENPASTE2(x, y) TOKENPASTE(x, y)#define NONIUS_BENCHMARK(name, f) 静态自动 TOKENPASTE2(s_reg_, __LINE__) = []{ ::nonius::s_benchmarks.push_back(f);返回 42;}();void run() { for (auto& b : s_benchmarks) b({});}}int main() {nonius::run();}#万一模板 <typename R>void do_test_kernel(nonius::chronometer& cm, std::vector<R> (*f)(It, It)) {std::vector<R>序列;cm.measure([&sequences,f]{sequence = f(input, input + len); });核实();}#define TEST_CASE(name, string) NONIUS_BENCHMARK(#name"-"#string, [](nonius::chronometer cm) { do_test_kernel(cm, &name<Sequence<string> >); })//Xpressive 不支持 string_viewTEST_CASE(parse_xpressive_linear, std::string)TEST_CASE(parse_xpressive_selective, std::string)TEST_CASE(parse_spirit_linear, std::string)TEST_CASE(parse_spirit_linear, boost::string_view)TEST_CASE(parse_spirit_selective, std::string)TEST_CASE(parse_spirit_selective, boost::string_view)#include #include 命名空间 xp = boost::xpressive;命名空间 XpressiveDetail {使用命名空间 xp;结构扫描器{cregex 扫描 {-*~xp::_n >>" - 序列:>>(+xp::_d) >>xp::_ln};};模板结构解析器:扫描器{可变序列序列;//非线程安全,但与 Spirit 相比更公平cregex real = (+_d >> '.' >> +_d);cregex 关键字 = " - SEQUENCE: " >>(+_d)[xp::ref(seq.sequence) = as<int>(_)];cregex 日期=重复 4(_d)>>'-'>>重复 3 (α) >'-'>>重复 2(_d)>>_s>>重复 2 (_d) >>':'>>重复 2 (_d) >>':'>>重复 2(_d)>>'.'>>(+_d);cregex 标头 = '[' >>日期[xp::ref(seq.date) = _] >>"]——">>real[xp::ref(seq.time) = as<double>(_)]>>" s => 驱动程序:" >>(+_d) [ xp ::ref(seq.driver) = as<int>(_) ]>>" - 速度:>>实数 [ xp ::ref(seq.vel) = as<double>(_) ]>>" - 道路:>>(+set[alnum|'-']) [ xp ::ref(seq.road) = _ ]>>" - 公里:>>实数 [xp ::ref(seq.km) = as<double>(_)];cregex 解析器 = (header >> 关键字 >> _ln);};}模板std::vectorparse_xpressive_linear(It b, It e) {std::vector序列;使用命名空间 xp;static const XpressiveDetail::Parser预编译{};for (xp::cregex_iterator cur(b, e, precompiled.parser), end; cur != end; ++cur)sequence.push_back(std::move(precompiled.seq));返回序列;}模板std::vectorparse_xpressive_selective(It b, It e) {std::vector序列;使用命名空间 xp;static const XpressiveDetail::Parser预编译{};xp::match_results米;for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {if (xp::regex_match(match[0].first, match[0].second, m, precompiled.parser))sequence.push_back(std::move(precompiled.seq));}返回序列;}//#定义BOOST_SPIRIT_DEBUG#include #include 命名空间 qi = boost::spirit::qi;命名空间提升 { 命名空间精神 { 命名空间特征 {模板 structassign_to_attribute_from_iterators{static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) };}};} } }模板 <typename It, typename Attribute>struct QiParser : qi::grammar{QiParser() : QiParser::base_type(line) {使用命名空间qi;自动日期时间 = 复制(重复(4)[数字]>>'-' >>重复(3)α>>'-' >>重复(2)[数字]>>' ' >>重复(2)[数字]>>':' >>重复(2)[数字]>>':' >>重复(2)[数字]>>'.'>>+数字);行 = '[' >>eps(clear(_val)) >>原始[日期时间] >>"]——">>double_>>"s">>" => 驱动程序:" >>整数_>>" - 速度:>>双倍的_>>" - 道路:>>原始[+图形]>>" - 公里:>>双倍的_>>" - 序列:>>整数_>>(eol|eoi);BOOST_SPIRIT_DEBUG_NODES((行))}私人的:结构clear_f {//仅适用于基于 std::string 的线性方法bool operator()(Sequence<std::string>& v) const { v = {};返回真;}bool operator()(Sequence&) const {/*no_op();*/return true;}};boost::phoenix::function清除;qi::rule线;};模板<typename Seq = Sequence<std::string>>std::vectorparse_spirit_selective(It b, It e) {静态 QiParserconst qi_parser{};static XpressiveDetail::Scanner const 预编译{};std::vector序列;for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {序列 r;if (parse(match[0].first,match[0].second,qi_parser,r))序列.push_back(r);}返回序列;}#include 模板<typename Seq = Sequence<std::string>>std::vectorparse_spirit_linear(It b, It e) {使用 boost::spirit::repository::qi::seek;静态 QiParserconst qi_parser{};std::vector序列;解析(b, e, *seek[qi_parser], 序列);返回序列;}

示例文本报告:

时钟分辨率:平均值为 17.7534 ns(40960002 次迭代)基准 parse_xpressive_linear-std::string收集 100 个样本,每个样本 1 次迭代,估计时间为 15.7252 毫秒平均值:156.418 μs,lb 155.863 μs,ub 158.24 μs,ci 0.95标准开发:4.62848 μs,lb 1637.89 ns,ub 10.4043 μs,ci 0.95在 100 个样本中发现 4 个异常值 (4%)异常值适度夸大了方差基准 parse_xpressive_selective-std::string收集 100 个样本,每个样本 1 次迭代,估计时间为 31.5459 毫秒平均值:313.992 μs,lb 313.39 μs,ub 315.599 μs,ci 0.95标准开发:4.5415 μs,lb 1105.98 ns,ub 9.07809 μs,ci 0.95在 100 个样本中发现了 11 个异常值 (11%)异常值略微夸大了方差基准 parse_spirit_linear-std::string收集 100 个样本,每个样本 1 次迭代,估计时间为 2.1556 毫秒平均值:21.2533 μs,lb 21.1623 μs,ub 21.6854 μs,ci 0.95标准开发:870.481 ns,lb 53.2809 ns,ub 2.0738 μs,ci 0.95在 100 个样本中发现 7 个异常值 (7%)异常值适度夸大了方差基准 parse_spirit_linear-boost::string_view收集 100 个样本,每个样本 2 次迭代,估计时间为 2.944 毫秒平均值:14.6677 μs,lb 14.6342 μs,ub 14.8279 μs,ci 0.95标准开发:318.252 ns,磅 22.5097 ns,ub 757.555 ns,ci 0.95在 100 个样本中发现 5 个异常值 (5%)异常值适度夸大了方差基准 parse_spirit_selective-std::string收集 100 个样本,每个样本 1 次迭代,估计时间为 27.5512 毫秒平均值:273.052 μs,lb 272.77 μs,ub 273.952 μs,ci 0.95标准开发:2.31473 μs,lb 835.184 ns,ub 5.1322 μs,ci 0.95在 100 个样本中发现 10 个异常值 (10%)方差不受异常值影响基准 parse_spirit_selective-boost::string_view收集 100 个样本,每个样本 1 次迭代,估计时间为 27.0766 毫秒平均值:269.446 μs,lb 269.208 μs,ub 270.268 μs,ci 0.95标准开发:2.01634 μs,lb 627.834 ns,ub 4.56949 μs,ci 0.95在 100 个样本中发现 10 个异常值 (10%)方差不受异常值影响

Lately I have being using boost xpressive for parsing files. These files are 10 MB each and there will be several hundred of them to parse.

Xpressive is nice to work and clear syntax, but the problems comes with performance. It is incredible how it crawls in debug versions, while in release version it spends more than a whole second per file. I have tested against old plain get_line(), find() and sscanf() code, and it can beat xpressive easily.

I understand that type checking, backtracking and so have a cost, but this seems excessive to me. How I wonder, I am doing something wrong? Is it any way of optimizing this to run at a decent pace? Should it deserve the effort to migrate code to boost::spirit?

I have prepared a lite version of code with a few lines of a real file embedded in case someone might test and help.

NOTE- As a requirement, VS 2010 must be used (not fully c++11 compliant unfortunately)

#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>

const char input[] = "[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: BTN-1002 - Km: 90.0 - SWITCH_ON: 1

[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0

[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0

[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1

[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1

[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1

[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry

[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4

[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8

[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry

[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15

[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5

[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21

[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2

[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29

[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34

[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1

[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45

[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10

[2018-Mar-13 13:14:04.510025] - 5.130 s => Driver: 0 - Speed: 4.9 - Road: B-302 - Km: 91.1 - GEAR: 3";

const auto len = std::distance(std::begin(input), std::end(input));

struct Sequence
{
    int ms;
    int driver;
    int sequence;
    double time;
    double vel;
    double km;
    std::string date;
    std::string road;
};

namespace xp = boost::xpressive;

int main()
{
    Sequence data;
    std::vector<Sequence> sequences;

    using namespace xp;

    cregex real = (+_d >> '.' >> +_d);
    cregex keyword = " - SEQUENCE: " >> (+_d)[xp::ref(data.sequence) = as<int>(_)];
    cregex date = repeat<4>(_d) >> '-' >> repeat<3>(alpha) >> '-' >> repeat<2>(_d) >> _s >> repeat<2>(_d) >> ':' >> repeat<2>(_d) >> ':' >> repeat<2>(_d);

    cregex header = '[' >> date[xp::ref(data.date) = _] >> '.' >> (+_d)[xp::ref(data.ms) = as<int>(_)] >> "] - "
                    >> real[xp::ref(data.time) = as<double>(_)]
                    >> " s => Driver: " >> (+_d)[xp::ref(data.driver) = as<int>(_)]
                    >> " - Speed: " >> real[xp::ref(data.vel) = as<double>(_)]
                    >> " - Road: " >> (+set[alnum | '-'])[xp::ref(data.road) = _]
                    >> " - Km: " >> real[xp::ref(data.km) = as<double>(_)];

    xp::cregex parser = (header >> keyword >> _ln);

    xp::cregex_iterator cur(input, input + len, parser);
    xp::cregex_iterator end;

    for (; cur != end; ++cur)
        sequences.emplace_back(data);

    return 0;
}

Please, mind the VS 2010 constraint.

解决方案

I see roughly two areas for improvement:

  • you basically parse all lines, including the ones that don't interest you
  • you allocate a lot of strings

I'd suggest using string views to fix the allocations. Next, you could try to avoid parsing lines that don't match the SEQUENCE pattern. There's no reason in principle why this couldn't be done using Boost Xpressive, but my weapon of choice happens to be Boost Spirit, so I'll include it too.

Being Selective

You can detect interesting lines before spending more effort like this:

cregex signature = -*~_n >> " - SEQUENCE: " >> (+_d) >> before(_ln|eos); 
for (xp::cregex_iterator cur(b, e, signature), end; cur != end; ++cur) {
    std::cout << "'" << cur->str() << "'
";
}

This prints

'[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1'
'[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4'
'[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8'
'[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15'
'[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21'
'[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29'
'[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34'
'[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45'

Nothing is allocated. This should be pretty fast.

Reducing Allocations

For this I'm going to switch to Spirit because it will make things easier.

Note: The real reason I switched here is because, in contrast to Boost Spirit, Xpressive does not appear to have extensible attribute propagation traits. This could be my lack of experience with it.

The alternative approach would almost certainly replace the actions with manual propagation code, which in turn would inform named capture groups in order to keep things legible. I'm not sure about the performance overhead of these, so let's not use them at this point.

You can use boost::string_view with a trait to "teach" Qi to assign text to it:

namespace boost { namespace spirit { namespace traits {
    template <typename It>
    struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
        static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
    };
} } }

That way, the Qi grammar could look just like this:

template <typename It> struct QiParser : qi::grammar<It, Sequence()> {
    QiParser() : QiParser::base_type(line) {
        using namespace qi;
        auto date_time = copy(
            repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >> 
            repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);

        line = '[' >> raw[date_time] >> "] - "
            >> double_ >> " s"
            >> " => Driver: "  >> int_
            >> " - Speed: "    >> double_
            >> " - Road: "     >> raw[+graph]
            >> " - Km: "       >> double_
            >> " - SEQUENCE: " >> int_
            >> (eol|eoi);
    }
  private:
    qi::rule<It, Sequence()> line;
};

Using it is exceedingly simple, especially if not being "selective".

This happens to be the "winning" configuration. Here's the standalone, simplified version of that algorithm after removing all benchmark-related generics and options: Live on Coliru

Benchmark Results: Surprises

Using the selective parsing approach only made the Xpressive approach slower: Interactive

Comparing to Spirit, I had initially started with the selective approach as well (fully anticipating it to be faster). Here's the not-so-encouraging results: Interactive

Oops. The initial Xpressive approach is still superior!

Adjusting The Assumptions

Okay, clearly doing the shallow scan first, and then the "full parse" hurts the performance. Theorizing, this is likely down to cache/prefetch effects. Also, the linear approach may win because it's easier to spot when a line doesn't start with a '[' character, than to see whether it ends with the SEQUENCE pattern.

So I decided to adapt the spirit approaches to linear mode too, and see whether the win by reducing allocations is still worth it: Interactive

Now we're getting results. Let's look at the difference between the std::string and boost::string_view approaches in detail: Interactive

Summary/Conclusions

The reduced allocations are good for 30% more efficiency. In total, an improvement of 10 times over the original approach.

Note that the benchmark code goes out of its way to eliminate unfair differences between the implementations (e.g. by pre compiling everything on both Spirit and Xpressive). See the full benchmark code:

The winning implementation in isolation: Live on Coliru

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen

using It = char const*;

struct Sequence {
    int driver;
    int sequence;
    double time;
    double vel;
    double km;
    boost::string_view date;
    boost::string_view road;
};

BOOST_FUSION_ADAPT_STRUCT(::Sequence, date, time, driver, vel, road, km, sequence)

namespace qi = boost::spirit::qi;

namespace boost { namespace spirit { namespace traits {
    template <typename It>
    struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
        static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
    };
} } }

std::vector<Sequence> parse_spirit(It b, It e) {

    qi::rule<It, Sequence()> static const line = []{
        using namespace qi;
        auto date_time = copy(
            repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >> 
            repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);

        qi::rule<It, Sequence()> r = '[' >> raw[date_time] >> "] - "
            >> double_ >> " s"
            >> " => Driver: "  >> int_
            >> " - Speed: "    >> double_
            >> " - Road: "     >> raw[+graph]
            >> " - Km: "       >> double_
            >> " - SEQUENCE: " >> int_
            >> (eol|eoi);

        return r;
    }();

    std::vector<Sequence> sequences;

    parse(b, e, *boost::spirit::repository::qi::seek[line], sequences);

    return sequences;
}

static char input[] = /*... see question ...*/;
static const size_t len = strlen(input);

int main() {
    auto sequences = parse_spirit(input, input+len);
    std::cout << "Parsed: " << sequences.size() << " sequence lines
";
}

Full Benchmark Code

The benchmarks use Nonius for the measurements and statistical analysis.

#include <cstring> // strlen

static char input[] = 
"[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0

[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0

[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1

[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1

[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1

[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry

[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4

[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8

[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry

[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15

[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5

[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21

[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2

[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29

[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34

[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1

[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45

[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10

[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0

[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0

[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1

[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1

[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1

[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry

[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4

[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8

[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry

[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15

[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5

[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21

[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2

[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29

[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34

[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1

[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45

[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10

[2018-Mar-13 13:14:04.510025] - 5.130 s => Driver: 0 - Speed: 4.9 - Road: B-302 - Km: 91.1 - GEAR: 3";
static const size_t len = strlen(input);

#include <boost/utility/string_view.hpp>
#include <boost/fusion/adapted/struct.hpp>

template <typename String> struct Sequence {
    int driver;
    int sequence;
    double time;
    double vel;
    double km;
    String date;
    String road;
};

BOOST_FUSION_ADAPT_TPL_STRUCT((T),(Sequence)(T), date, time, driver, vel, road, km, sequence)

// Declare implementations under test:
using It = char const*;
template <typename S> std::vector<S> parse_xpressive_linear(It b, It e);
template <typename S> std::vector<S> parse_xpressive_selective(It b, It e);
template <typename S> std::vector<S> parse_spirit_linear(It b, It e);
template <typename S> std::vector<S> parse_spirit_selective(It b, It e);

#ifdef VERIFY_OUTPUT
    #include <boost/fusion/include/io.hpp>
    using boost::fusion::operator<<;
    #include <iostream>

    #define VERIFY()                                                                    
        do {                                                                            
            std::cout << "L:" << __LINE__ << " Parsed: " << sequences.size() << "
";   
            for (auto r : sequences) {                                                  
                std::cout << r << "
";                                                 
            }                                                                           
        } while (0)
#else
    #define VERIFY() do { } while (0)
#endif

#ifdef USE_NONIUS
    #include <nonius/benchmark.h++>
    #define NONIUS_RUNNER
    #include <nonius/main.h++>
#else
    // mock nonius
    namespace nonius {
        struct chronometer{
            template <typename F> static inline void measure(F&& f) { std::forward<F>(f)(); }
        };
        static std::vector<std::function<void(chronometer)>> s_benchmarks;
        #define TOKENPASTE(x, y) x ## y
        #define TOKENPASTE2(x, y) TOKENPASTE(x, y)
        #define NONIUS_BENCHMARK(name, f) static auto TOKENPASTE2(s_reg_, __LINE__) = []{ ::nonius::s_benchmarks.push_back(f); return 42; }();

        void run() { for (auto& b : s_benchmarks) b({}); }
    }

    int main() {
        nonius::run();
    }
#endif

template <typename R>
void do_test_kernel(nonius::chronometer& cm, std::vector<R> (*f)(It, It)) {
    std::vector<R> sequences;
    cm.measure([&sequences,f]{ sequences = f(input, input + len); });
    VERIFY();
}

#define TEST_CASE(name, string) NONIUS_BENCHMARK(#name"-"#string, [](nonius::chronometer cm) { do_test_kernel(cm, &name<Sequence<string> >); })
// Xpressive doesn't support string_view
TEST_CASE(parse_xpressive_linear,    std::string)
TEST_CASE(parse_xpressive_selective, std::string)

TEST_CASE(parse_spirit_linear,       std::string)
TEST_CASE(parse_spirit_linear,       boost::string_view)
TEST_CASE(parse_spirit_selective,    std::string)
TEST_CASE(parse_spirit_selective,    boost::string_view)

#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>

namespace xp = boost::xpressive;

namespace XpressiveDetail {
    using namespace xp;

    struct Scanner {
        cregex scan {-*~xp::_n >> " - SEQUENCE: " >> (+xp::_d) >> xp::_ln};
    };

    template <typename Seq> struct Parser : Scanner {
        mutable Seq seq; // non-thread-safe, but fairer to compare to Spirit

        cregex real    = (+_d >> '.' >> +_d);
        cregex keyword = " - SEQUENCE: " >> (+_d)[xp::ref(seq.sequence) = as<int>(_)];
        cregex date    = repeat<4>(_d) >> '-' 
            >> repeat<3>(alpha) >> '-' 
            >> repeat<2>(_d) 
            >> _s 
            >> repeat<2>(_d) >> ':' 
            >> repeat<2>(_d) >> ':' 
            >> repeat<2>(_d)
            >> '.' >> (+_d);

        cregex header = '[' >> date[xp::ref(seq.date) = _] >> "] - "
            >> real[xp::ref(seq.time) = as<double>(_)]
            >> " s => Driver: " >> (+_d)             [ xp ::ref(seq.driver) = as<int>(_) ]
            >> " - Speed: "     >> real              [ xp ::ref(seq.vel)    = as<double>(_) ]
            >> " - Road: "      >> (+set[alnum|'-']) [ xp ::ref(seq.road)   = _ ]
            >> " - Km: "        >> real              [ xp ::ref(seq.km)     = as<double>(_) ];

        cregex parser = (header >> keyword >> _ln);
    };
}

template <typename Seq>
std::vector<Seq> parse_xpressive_linear(It b, It e) {
    std::vector<Seq> sequences;
    using namespace xp;

    static const XpressiveDetail::Parser<Seq> precompiled{};

    for (xp::cregex_iterator cur(b, e, precompiled.parser), end; cur != end; ++cur)
        sequences.push_back(std::move(precompiled.seq));

    return sequences;
}

template <typename Seq>
std::vector<Seq> parse_xpressive_selective(It b, It e) {
    std::vector<Seq> sequences;
    using namespace xp;

    static const XpressiveDetail::Parser<Seq> precompiled{};
    xp::match_results<It> m;

    for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {
        if (xp::regex_match(match[0].first, match[0].second, m, precompiled.parser))
            sequences.push_back(std::move(precompiled.seq));
    }

    return sequences;
}

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;

namespace boost { namespace spirit { namespace traits {
    template <typename It>
    struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
        static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
    };
} } }

template <typename It, typename Attribute> struct QiParser : qi::grammar<It, Attribute()> {
    QiParser() : QiParser::base_type(line) {
        using namespace qi;
        auto date_time = copy(
            repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >> 
            repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);

        line = '[' >> eps(clear(_val)) >> raw[date_time] >> "] - "
            >> double_ >> " s"
            >> " => Driver: "  >> int_
            >> " - Speed: "    >> double_
            >> " - Road: "     >> raw[+graph]
            >> " - Km: "       >> double_
            >> " - SEQUENCE: " >> int_
            >> (eol|eoi);

        BOOST_SPIRIT_DEBUG_NODES((line))
    }
  private:
    struct clear_f {
        // only required for linear approach to std::string-based
        bool operator()(Sequence<std::string>& v)      const { v = {};      return true; }
        bool operator()(Sequence<boost::string_view>&) const { /*no_op();*/ return true; }
    };
    boost::phoenix::function<clear_f> clear;

    qi::rule<It, Attribute()> line;
};

template <typename Seq = Sequence<std::string> >
std::vector<Seq> parse_spirit_selective(It b, It e) {
    static QiParser<It, Seq> const qi_parser{};
    static XpressiveDetail::Scanner const precompiled{};

    std::vector<Seq> sequences;

    for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {
        Seq r;
        if (parse(match[0].first, match[0].second, qi_parser, r))
            sequences.push_back(r);
    }

    return sequences;
}

#include <boost/spirit/repository/include/qi_seek.hpp>

template <typename Seq = Sequence<std::string> >
std::vector<Seq> parse_spirit_linear(It b, It e) {
    using boost::spirit::repository::qi::seek;

    static QiParser<It, Seq> const qi_parser{};

    std::vector<Seq> sequences;
    parse(b, e, *seek[qi_parser], sequences);
    return sequences;
}

Sample text report:

clock resolution: mean is 17.7534 ns (40960002 iterations)

benchmarking parse_xpressive_linear-std::string
collecting 100 samples, 1 iterations each, in estimated 15.7252 ms
mean: 156.418 μs, lb 155.863 μs, ub 158.24 μs, ci 0.95
std dev: 4.62848 μs, lb 1637.89 ns, ub 10.4043 μs, ci 0.95
found 4 outliers among 100 samples (4%)
variance is moderately inflated by outliers

benchmarking parse_xpressive_selective-std::string
collecting 100 samples, 1 iterations each, in estimated 31.5459 ms
mean: 313.992 μs, lb 313.39 μs, ub 315.599 μs, ci 0.95
std dev: 4.5415 μs, lb 1105.98 ns, ub 9.07809 μs, ci 0.95
found 11 outliers among 100 samples (11%)
variance is slightly inflated by outliers

benchmarking parse_spirit_linear-std::string
collecting 100 samples, 1 iterations each, in estimated 2.1556 ms
mean: 21.2533 μs, lb 21.1623 μs, ub 21.6854 μs, ci 0.95
std dev: 870.481 ns, lb 53.2809 ns, ub 2.0738 μs, ci 0.95
found 7 outliers among 100 samples (7%)
variance is moderately inflated by outliers

benchmarking parse_spirit_linear-boost::string_view
collecting 100 samples, 2 iterations each, in estimated 2.944 ms
mean: 14.6677 μs, lb 14.6342 μs, ub 14.8279 μs, ci 0.95
std dev: 318.252 ns, lb 22.5097 ns, ub 757.555 ns, ci 0.95
found 5 outliers among 100 samples (5%)
variance is moderately inflated by outliers

benchmarking parse_spirit_selective-std::string
collecting 100 samples, 1 iterations each, in estimated 27.5512 ms
mean: 273.052 μs, lb 272.77 μs, ub 273.952 μs, ci 0.95
std dev: 2.31473 μs, lb 835.184 ns, ub 5.1322 μs, ci 0.95
found 10 outliers among 100 samples (10%)
variance is unaffected by outliers

benchmarking parse_spirit_selective-boost::string_view
collecting 100 samples, 1 iterations each, in estimated 27.0766 ms
mean: 269.446 μs, lb 269.208 μs, ub 270.268 μs, ci 0.95
std dev: 2.01634 μs, lb 627.834 ns, ub 4.56949 μs, ci 0.95
found 10 outliers among 100 samples (10%)
variance is unaffected by outliers

这篇关于使用 boost xpressive 降低性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆