使用Boost Xpressive会降低性能 [英] Slow performance using boost xpressive

查看:111
本文介绍了使用Boost Xpressive会降低性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我一直在使用boost xpressive来解析文件.这些文件各有10 MB,将要解析几百个.

Lately I have being using boost xpressive for parsing files. These files are 10 MB each and there will be several hundred of them to parse.

Xpressive可以很好地工作并且语法清晰,但是问题出在性能上.令人难以置信的是,它如何在调试版本中进行爬网,而在发行版中,每个文件花费的时间超过一整秒.我已经针对旧的普通get_line(),find()和sscanf()代码进行了测试,它可以轻松击败xpressive.

Xpressive is nice to work and clear syntax, but the problems comes with performance. It is incredible how it crawls in debug versions, while in release version it spends more than a whole second per file. I have tested against old plain get_line(), find() and sscanf() code, and it can beat xpressive easily.

我知道类型检查,回溯等都需要付出一定的代价,但这对我来说似乎太过分了.我怎么知道我做错了什么?有什么方法可以优化它,使其运行得像样?

I understand that type checking, backtracking and so have a cost, but this seems excessive to me. How I wonder, I am doing something wrong? Is it any way of optimizing this to run at a decent pace? Should it deserve the effort to migrate code to boost::spirit?

我准备了精简版的代码,其中嵌入了几行真实文件,以防有人测试和帮助.

I have prepared a lite version of code with a few lines of a real file embedded in case someone might test and help.

注意-根据要求,必须使用VS 2010(不幸的是,它不完全兼容c ++ 11)

NOTE- As a requirement, VS 2010 must be used (not fully c++11 compliant unfortunately)

#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>

const char input[] = "[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: BTN-1002 - Km: 90.0 - SWITCH_ON: 1\n\
[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8\n\
[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15\n\
[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5\n\
[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21\n\
[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2\n\
[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29\n\
[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34\n\
[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1\n\
[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45\n\
[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10\n\
[2018-Mar-13 13:14:04.510025] - 5.130 s => Driver: 0 - Speed: 4.9 - Road: B-302 - Km: 91.1 - GEAR: 3";

const auto len = std::distance(std::begin(input), std::end(input));

struct Sequence
{
    int ms;
    int driver;
    int sequence;
    double time;
    double vel;
    double km;
    std::string date;
    std::string road;
};

namespace xp = boost::xpressive;

int main()
{
    Sequence data;
    std::vector<Sequence> sequences;

    using namespace xp;

    cregex real = (+_d >> '.' >> +_d);
    cregex keyword = " - SEQUENCE: " >> (+_d)[xp::ref(data.sequence) = as<int>(_)];
    cregex date = repeat<4>(_d) >> '-' >> repeat<3>(alpha) >> '-' >> repeat<2>(_d) >> _s >> repeat<2>(_d) >> ':' >> repeat<2>(_d) >> ':' >> repeat<2>(_d);

    cregex header = '[' >> date[xp::ref(data.date) = _] >> '.' >> (+_d)[xp::ref(data.ms) = as<int>(_)] >> "] - "
                    >> real[xp::ref(data.time) = as<double>(_)]
                    >> " s => Driver: " >> (+_d)[xp::ref(data.driver) = as<int>(_)]
                    >> " - Speed: " >> real[xp::ref(data.vel) = as<double>(_)]
                    >> " - Road: " >> (+set[alnum | '-'])[xp::ref(data.road) = _]
                    >> " - Km: " >> real[xp::ref(data.km) = as<double>(_)];

    xp::cregex parser = (header >> keyword >> _ln);

    xp::cregex_iterator cur(input, input + len, parser);
    xp::cregex_iterator end;

    for (; cur != end; ++cur)
        sequences.emplace_back(data);

    return 0;
}

请注意VS 2010的约束条件.

Please, mind the VS 2010 constraint.

推荐答案

我认为有两个方面需要改进:

I see roughly two areas for improvement:

  • 您基本上会分析所有行,包括您不感兴趣的行
  • 您分配了很多字符串

我建议使用字符串视图来修复分配.接下来,您可以尝试避免解析与​​SEQUENCE模式不匹配的行.原则上没有理由使用Boost Xpressive无法做到这一点,但是我选择的武器恰好是Boost Spirit,所以我也将其包括在内.

I'd suggest using string views to fix the allocations. Next, you could try to avoid parsing lines that don't match the SEQUENCE pattern. There's no reason in principle why this couldn't be done using Boost Xpressive, but my weapon of choice happens to be Boost Spirit, so I'll include it too.

您可以在花费更多精力之前检测出有趣的线条:

You can detect interesting lines before spending more effort like this:

cregex signature = -*~_n >> " - SEQUENCE: " >> (+_d) >> before(_ln|eos); 
for (xp::cregex_iterator cur(b, e, signature), end; cur != end; ++cur) {
    std::cout << "'" << cur->str() << "'\n";
}

此打印

'[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1'
'[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4'
'[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8'
'[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15'
'[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21'
'[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29'
'[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34'
'[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45'

没有分配任何东西.这应该很快.

Nothing is allocated. This should be pretty fast.

为此,我将切换到Spirit,因为它将使事情变得更容易.

For this I'm going to switch to Spirit because it will make things easier.

注意:我在这里切换的真正原因是,与Boost Spirit相比,Xpressive似乎没有可扩展的属性传播特征.这可能是我缺乏经验.

Note: The real reason I switched here is because, in contrast to Boost Spirit, Xpressive does not appear to have extensible attribute propagation traits. This could be my lack of experience with it.

几乎可以肯定,另一种方法是用手动传播代码代替这些动作,而手动传播代码又将通知命名的捕获组,以使事物清晰易读.我不确定这些服务器的性能开销,因此暂时不要使用它们.

The alternative approach would almost certainly replace the actions with manual propagation code, which in turn would inform named capture groups in order to keep things legible. I'm not sure about the performance overhead of these, so let's not use them at this point.

您可以使用带有特征的boost::string_view来教" Qi为其分配文字:

You can use boost::string_view with a trait to "teach" Qi to assign text to it:

namespace boost { namespace spirit { namespace traits {
    template <typename It>
    struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
        static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
    };
} } }

这样,齐文法可能看起来像这样:

That way, the Qi grammar could look just like this:

template <typename It> struct QiParser : qi::grammar<It, Sequence()> {
    QiParser() : QiParser::base_type(line) {
        using namespace qi;
        auto date_time = copy(
            repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >> 
            repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);

        line = '[' >> raw[date_time] >> "] - "
            >> double_ >> " s"
            >> " => Driver: "  >> int_
            >> " - Speed: "    >> double_
            >> " - Road: "     >> raw[+graph]
            >> " - Km: "       >> double_
            >> " - SEQUENCE: " >> int_
            >> (eol|eoi);
    }
  private:
    qi::rule<It, Sequence()> line;
};

使用它非常简单,尤其是如果不是选择性的".

Using it is exceedingly simple, especially if not being "selective".

这恰好是获胜"配置.在删除所有与基准测试相关的泛型和选项之后,这是该算法的独立简化版本:

基准测试结果:惊喜

使用选择性解析方法只会使Xpressive方法变慢:

与Spirit相比,我最初也从选择性方法入手(完全预期它会更快).这是不太令人鼓舞的结果: 互动

Comparing to Spirit, I had initially started with the selective approach as well (fully anticipating it to be faster). Here's the not-so-encouraging results: Interactive

糟糕.最初的Xpressive方法仍然优越!

Oops. The initial Xpressive approach is still superior!

好的,首先显然要进行浅层扫描,然后再进行完全解析"会损害性能.从理论上讲,这很可能归结于缓存/预取效果.另外,线性方法可能会获胜,因为发现行不是以'['字符开头时,比发现行是否以SEQUENCE模式结尾更容易.

Okay, clearly doing the shallow scan first, and then the "full parse" hurts the performance. Theorizing, this is likely down to cache/prefetch effects. Also, the linear approach may win because it's easier to spot when a line doesn't start with a '[' character, than to see whether it ends with the SEQUENCE pattern.

因此,我决定也将主旨方法也调整为线性模式,并查看通过减少分配来获得胜利是否仍然值得:

So I decided to adapt the spirit approaches to linear mode too, and see whether the win by reducing allocations is still worth it: Interactive

现在我们正在取得结果.让我们详细了解std::stringboost::string_view方法之间的区别:

Now we're getting results. Let's look at the difference between the std::string and boost::string_view approaches in detail: Interactive

减少的分配有利于提高 30%的效率.总体而言,与原始方法相比,改进了 10倍.

The reduced allocations are good for 30% more efficiency. In total, an improvement of 10 times over the original approach.

请注意,基准代码会尽力消除实现之间的不公平差异(例如,通过预先编译Spirit和Xpressive上的所有内容).查看完整的基准代码:

Note that the benchmark code goes out of its way to eliminate unfair differences between the implementations (e.g. by pre compiling everything on both Spirit and Xpressive). See the full benchmark code:

孤立的获胜实现: 在Coliru上直播

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
#include <cstring> // strlen

using It = char const*;

struct Sequence {
    int driver;
    int sequence;
    double time;
    double vel;
    double km;
    boost::string_view date;
    boost::string_view road;
};

BOOST_FUSION_ADAPT_STRUCT(::Sequence, date, time, driver, vel, road, km, sequence)

namespace qi = boost::spirit::qi;

namespace boost { namespace spirit { namespace traits {
    template <typename It>
    struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
        static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
    };
} } }

std::vector<Sequence> parse_spirit(It b, It e) {

    qi::rule<It, Sequence()> static const line = []{
        using namespace qi;
        auto date_time = copy(
            repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >> 
            repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);

        qi::rule<It, Sequence()> r = '[' >> raw[date_time] >> "] - "
            >> double_ >> " s"
            >> " => Driver: "  >> int_
            >> " - Speed: "    >> double_
            >> " - Road: "     >> raw[+graph]
            >> " - Km: "       >> double_
            >> " - SEQUENCE: " >> int_
            >> (eol|eoi);

        return r;
    }();

    std::vector<Sequence> sequences;

    parse(b, e, *boost::spirit::repository::qi::seek[line], sequences);

    return sequences;
}

static char input[] = /*... see question ...*/;
static const size_t len = strlen(input);

int main() {
    auto sequences = parse_spirit(input, input+len);
    std::cout << "Parsed: " << sequences.size() << " sequence lines\n";
}

完整基准代码

基准使用 Nonius 进行测量和统计分析.

Full Benchmark Code

The benchmarks use Nonius for the measurements and statistical analysis.

  • Full interactive graphs here: http://stackoverflow-sehe.s3.amazonaws.com/9f88e055-4b5f-4026-8f2f-54e2bcad430d/stats.html
  • Compile with -DUSE_NONIUS if you have Nonius available
  • Compile with -DVERIFY_OUTPUT for "correctness" mode: in this case no timings are done but the results of the parse are echoed for validation
#include <cstring> // strlen

static char input[] = 
"[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8\n\
[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15\n\
[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5\n\
[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21\n\
[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2\n\
[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29\n\
[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34\n\
[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1\n\
[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45\n\
[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10\n\
[2018-Mar-13 13:13:59.580482] - 0.200 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - SLOPE: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.0 - Road: A-11 - Km: 90.0 - GEAR: 0\n\
[2018-Mar-13 13:14:01.170203] - 1.790 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - GEAR: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.1 - Road: A-11 - Km: 90.0 - SEQUENCE: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.440 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - CLUTCH: 1\n\
[2018-Mar-13 13:14:01.819966] - 2.540 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.2 - Road: A-11 - Km: 90.0 - SEQUENCE: 4\n\
[2018-Mar-13 13:14:02.409855] - 3.030 s => Driver: 0 - Speed: 0.3 - Road: A-11 - Km: 90.0 - SEQUENCE: 8\n\
[2018-Mar-13 13:14:01.819966] - 3.110 s => Backup to regestry\n\
[2018-Mar-13 13:14:02.620424] - 3.240 s => Driver: 0 - Speed: 0.4 - Road: A-11 - Km: 90.1 - SEQUENCE: 15\n\
[2018-Mar-13 13:14:02.829983] - 3.450 s => Driver: 0 - Speed: 0.6 - Road: B-302 - Km: 90.1 - SLOPE: -5\n\
[2018-Mar-13 13:14:03.039600] - 3.660 s => Driver: 0 - Speed: 0.8 - Road: B-302 - Km: 90.1 - SEQUENCE: 21\n\
[2018-Mar-13 13:14:03.250451] - 3.870 s => Driver: 0 - Speed: 1.2 - Road: B-302 - Km: 90.2 - GEAR: 2\n\
[2018-Mar-13 13:14:03.460012] - 4.080 s => Driver: 0 - Speed: 1.7 - Road: B-302 - Km: 90.3 - SEQUENCE: 29\n\
[2018-Mar-13 13:14:03.669448] - 4.290 s => Driver: 0 - Speed: 2.2 - Road: B-302 - Km: 90.4 - SEQUENCE: 34\n\
[2018-Mar-13 13:14:03.880066] - 4.500 s => Driver: 0 - Speed: 2.8 - Road: B-302 - Km: 90.5 - CLUTCH: 1\n\
[2018-Mar-13 13:14:04.090444] - 4.710 s => Driver: 0 - Speed: 3.5 - Road: B-302 - Km: 90.7 - SEQUENCE: 45\n\
[2018-Mar-13 13:14:04.300160] - 4.920 s => Driver: 0 - Speed: 4.2 - Road: B-302 - Km: 90.9 - SLOPE: 10\n\
[2018-Mar-13 13:14:04.510025] - 5.130 s => Driver: 0 - Speed: 4.9 - Road: B-302 - Km: 91.1 - GEAR: 3";
static const size_t len = strlen(input);

#include <boost/utility/string_view.hpp>
#include <boost/fusion/adapted/struct.hpp>

template <typename String> struct Sequence {
    int driver;
    int sequence;
    double time;
    double vel;
    double km;
    String date;
    String road;
};

BOOST_FUSION_ADAPT_TPL_STRUCT((T),(Sequence)(T), date, time, driver, vel, road, km, sequence)

// Declare implementations under test:
using It = char const*;
template <typename S> std::vector<S> parse_xpressive_linear(It b, It e);
template <typename S> std::vector<S> parse_xpressive_selective(It b, It e);
template <typename S> std::vector<S> parse_spirit_linear(It b, It e);
template <typename S> std::vector<S> parse_spirit_selective(It b, It e);

#ifdef VERIFY_OUTPUT
    #include <boost/fusion/include/io.hpp>
    using boost::fusion::operator<<;
    #include <iostream>

    #define VERIFY()                                                                    \
        do {                                                                            \
            std::cout << "L:" << __LINE__ << " Parsed: " << sequences.size() << "\n";   \
            for (auto r : sequences) {                                                  \
                std::cout << r << "\n";                                                 \
            }                                                                           \
        } while (0)
#else
    #define VERIFY() do { } while (0)
#endif

#ifdef USE_NONIUS
    #include <nonius/benchmark.h++>
    #define NONIUS_RUNNER
    #include <nonius/main.h++>
#else
    // mock nonius
    namespace nonius {
        struct chronometer{
            template <typename F> static inline void measure(F&& f) { std::forward<F>(f)(); }
        };
        static std::vector<std::function<void(chronometer)>> s_benchmarks;
        #define TOKENPASTE(x, y) x ## y
        #define TOKENPASTE2(x, y) TOKENPASTE(x, y)
        #define NONIUS_BENCHMARK(name, f) static auto TOKENPASTE2(s_reg_, __LINE__) = []{ ::nonius::s_benchmarks.push_back(f); return 42; }();

        void run() { for (auto& b : s_benchmarks) b({}); }
    }

    int main() {
        nonius::run();
    }
#endif

template <typename R>
void do_test_kernel(nonius::chronometer& cm, std::vector<R> (*f)(It, It)) {
    std::vector<R> sequences;
    cm.measure([&sequences,f]{ sequences = f(input, input + len); });
    VERIFY();
}

#define TEST_CASE(name, string) NONIUS_BENCHMARK(#name"-"#string, [](nonius::chronometer cm) { do_test_kernel(cm, &name<Sequence<string> >); })
// Xpressive doesn't support string_view
TEST_CASE(parse_xpressive_linear,    std::string)
TEST_CASE(parse_xpressive_selective, std::string)

TEST_CASE(parse_spirit_linear,       std::string)
TEST_CASE(parse_spirit_linear,       boost::string_view)
TEST_CASE(parse_spirit_selective,    std::string)
TEST_CASE(parse_spirit_selective,    boost::string_view)

#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>

namespace xp = boost::xpressive;

namespace XpressiveDetail {
    using namespace xp;

    struct Scanner {
        cregex scan {-*~xp::_n >> " - SEQUENCE: " >> (+xp::_d) >> xp::_ln};
    };

    template <typename Seq> struct Parser : Scanner {
        mutable Seq seq; // non-thread-safe, but fairer to compare to Spirit

        cregex real    = (+_d >> '.' >> +_d);
        cregex keyword = " - SEQUENCE: " >> (+_d)[xp::ref(seq.sequence) = as<int>(_)];
        cregex date    = repeat<4>(_d) >> '-' 
            >> repeat<3>(alpha) >> '-' 
            >> repeat<2>(_d) 
            >> _s 
            >> repeat<2>(_d) >> ':' 
            >> repeat<2>(_d) >> ':' 
            >> repeat<2>(_d)
            >> '.' >> (+_d);

        cregex header = '[' >> date[xp::ref(seq.date) = _] >> "] - "
            >> real[xp::ref(seq.time) = as<double>(_)]
            >> " s => Driver: " >> (+_d)             [ xp ::ref(seq.driver) = as<int>(_) ]
            >> " - Speed: "     >> real              [ xp ::ref(seq.vel)    = as<double>(_) ]
            >> " - Road: "      >> (+set[alnum|'-']) [ xp ::ref(seq.road)   = _ ]
            >> " - Km: "        >> real              [ xp ::ref(seq.km)     = as<double>(_) ];

        cregex parser = (header >> keyword >> _ln);
    };
}

template <typename Seq>
std::vector<Seq> parse_xpressive_linear(It b, It e) {
    std::vector<Seq> sequences;
    using namespace xp;

    static const XpressiveDetail::Parser<Seq> precompiled{};

    for (xp::cregex_iterator cur(b, e, precompiled.parser), end; cur != end; ++cur)
        sequences.push_back(std::move(precompiled.seq));

    return sequences;
}

template <typename Seq>
std::vector<Seq> parse_xpressive_selective(It b, It e) {
    std::vector<Seq> sequences;
    using namespace xp;

    static const XpressiveDetail::Parser<Seq> precompiled{};
    xp::match_results<It> m;

    for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {
        if (xp::regex_match(match[0].first, match[0].second, m, precompiled.parser))
            sequences.push_back(std::move(precompiled.seq));
    }

    return sequences;
}

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;

namespace boost { namespace spirit { namespace traits {
    template <typename It>
    struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
        static inline void call(It f, It l, boost::string_view& attr) { attr = boost::string_view { &*f, size_t(std::distance(f,l)) }; }
    };
} } }

template <typename It, typename Attribute> struct QiParser : qi::grammar<It, Attribute()> {
    QiParser() : QiParser::base_type(line) {
        using namespace qi;
        auto date_time = copy(
            repeat(4)[digit] >> '-' >> repeat(3)[alpha] >> '-' >> repeat(2)[digit] >> ' ' >> 
            repeat(2)[digit] >> ':' >> repeat(2)[digit] >> ':' >> repeat(2)[digit] >> '.' >> +digit);

        line = '[' >> eps(clear(_val)) >> raw[date_time] >> "] - "
            >> double_ >> " s"
            >> " => Driver: "  >> int_
            >> " - Speed: "    >> double_
            >> " - Road: "     >> raw[+graph]
            >> " - Km: "       >> double_
            >> " - SEQUENCE: " >> int_
            >> (eol|eoi);

        BOOST_SPIRIT_DEBUG_NODES((line))
    }
  private:
    struct clear_f {
        // only required for linear approach to std::string-based
        bool operator()(Sequence<std::string>& v)      const { v = {};      return true; }
        bool operator()(Sequence<boost::string_view>&) const { /*no_op();*/ return true; }
    };
    boost::phoenix::function<clear_f> clear;

    qi::rule<It, Attribute()> line;
};

template <typename Seq = Sequence<std::string> >
std::vector<Seq> parse_spirit_selective(It b, It e) {
    static QiParser<It, Seq> const qi_parser{};
    static XpressiveDetail::Scanner const precompiled{};

    std::vector<Seq> sequences;

    for (auto& match : boost::make_iterator_range(xp::cregex_iterator{b, e, precompiled.scan}, {})) {
        Seq r;
        if (parse(match[0].first, match[0].second, qi_parser, r))
            sequences.push_back(r);
    }

    return sequences;
}

#include <boost/spirit/repository/include/qi_seek.hpp>

template <typename Seq = Sequence<std::string> >
std::vector<Seq> parse_spirit_linear(It b, It e) {
    using boost::spirit::repository::qi::seek;

    static QiParser<It, Seq> const qi_parser{};

    std::vector<Seq> sequences;
    parse(b, e, *seek[qi_parser], sequences);
    return sequences;
}

示例文本报告:

clock resolution: mean is 17.7534 ns (40960002 iterations)

benchmarking parse_xpressive_linear-std::string
collecting 100 samples, 1 iterations each, in estimated 15.7252 ms
mean: 156.418 μs, lb 155.863 μs, ub 158.24 μs, ci 0.95
std dev: 4.62848 μs, lb 1637.89 ns, ub 10.4043 μs, ci 0.95
found 4 outliers among 100 samples (4%)
variance is moderately inflated by outliers

benchmarking parse_xpressive_selective-std::string
collecting 100 samples, 1 iterations each, in estimated 31.5459 ms
mean: 313.992 μs, lb 313.39 μs, ub 315.599 μs, ci 0.95
std dev: 4.5415 μs, lb 1105.98 ns, ub 9.07809 μs, ci 0.95
found 11 outliers among 100 samples (11%)
variance is slightly inflated by outliers

benchmarking parse_spirit_linear-std::string
collecting 100 samples, 1 iterations each, in estimated 2.1556 ms
mean: 21.2533 μs, lb 21.1623 μs, ub 21.6854 μs, ci 0.95
std dev: 870.481 ns, lb 53.2809 ns, ub 2.0738 μs, ci 0.95
found 7 outliers among 100 samples (7%)
variance is moderately inflated by outliers

benchmarking parse_spirit_linear-boost::string_view
collecting 100 samples, 2 iterations each, in estimated 2.944 ms
mean: 14.6677 μs, lb 14.6342 μs, ub 14.8279 μs, ci 0.95
std dev: 318.252 ns, lb 22.5097 ns, ub 757.555 ns, ci 0.95
found 5 outliers among 100 samples (5%)
variance is moderately inflated by outliers

benchmarking parse_spirit_selective-std::string
collecting 100 samples, 1 iterations each, in estimated 27.5512 ms
mean: 273.052 μs, lb 272.77 μs, ub 273.952 μs, ci 0.95
std dev: 2.31473 μs, lb 835.184 ns, ub 5.1322 μs, ci 0.95
found 10 outliers among 100 samples (10%)
variance is unaffected by outliers

benchmarking parse_spirit_selective-boost::string_view
collecting 100 samples, 1 iterations each, in estimated 27.0766 ms
mean: 269.446 μs, lb 269.208 μs, ub 270.268 μs, ci 0.95
std dev: 2.01634 μs, lb 627.834 ns, ub 4.56949 μs, ci 0.95
found 10 outliers among 100 samples (10%)
variance is unaffected by outliers

这篇关于使用Boost Xpressive会降低性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆