为什么在增强精神中使用流会大大降低性能? [英] Why does using a stream in boost spirit penalize performance so much?
问题描述
我准备了一个小型基准程序,用于测量不同的解析方式.问题在于使用流和将日期存储为time_t + double的自定义函数时,性能会大大降低.
I have prepared a small benchmark program for measuring different ways of parsing. The problem comes with the huge decrease in performance when using a stream and a custom function for storing a date as a time_t + double.
std :: string怪异的增强精神特质是因为寻求回溯将不匹配行的所有公共部分填充到可变字符串中,直到找到匹配的行为止.
The weird boost spirit trait for std::string is because seek backtracking fills the variable string with all the common parts of non-matching lines until a line that matches is found.
很抱歉,源代码质量(复制/粘贴,变量名称错误,缩进弱...).我知道此基准代码将不会包含在干净代码"书中,因此请忽略这一事实,让我们集中讨论该主题.
Sorry for the source code quality (copy/paste, bad variable names, weak indentation...). I am aware that this benchmark code is not going to be included in Clean Code book, so please ignore this fact and let's focus on the subject.
我知道最快的方法是使用没有回溯的字符串,但是流的时间增量确实很奇怪.有人可以解释我发生了什么事吗?
I understand that the fastest way is using a string without backtracking, but time increment for stream is really strange. Can someone explain me what is going on?
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/chrono/chrono.hpp>
#include <iomanip>
#include <ctime>
typedef std::string::const_iterator It;
namespace structs {
struct Timestamp {
std::time_t date;
double ms;
friend std::istream& operator>> (std::istream& stream, Timestamp& time)
{
struct std::tm tm;
if (stream >> std::get_time(&tm, "%Y-%b-%d %H:%M:%S") >> time.ms)
time.date = std::mktime(&tm);
return stream;
}
};
struct Record1 {
std::string date;
double time;
std::string str;
};
struct Record2 {
Timestamp date;
double time;
std::string str;
};
typedef std::vector<Record1> Records1;
typedef std::vector<Record2> Records2;
}
BOOST_FUSION_ADAPT_STRUCT(structs::Record1,
(std::string, date)
(double, time)
(std::string, str))
BOOST_FUSION_ADAPT_STRUCT(structs::Record2,
(structs::Timestamp, date)
(double, time)
(std::string, str))
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<std::string, It, void> {
static inline void call(It f, It l, std::string& attr) {
attr = std::string(&*f, std::distance(f,l));
}
};
} } }
namespace qi = boost::spirit::qi;
namespace QiParsers {
template <typename It>
struct Parser1 : qi::grammar<It, structs::Record1()>
{
Parser1() : Parser1::base_type(start) {
using namespace qi;
start = '[' >> raw[*~char_(']')] >> ']'
>> " - " >> double_ >> " s"
>> " => String: " >> raw[+graph]
>> eol;
}
private:
qi::rule<It, structs::Record1()> start;
};
template <typename It>
struct Parser2 : qi::grammar<It, structs::Record2()>
{
Parser2() : Parser2::base_type(start) {
using namespace qi;
start = '[' >> stream >> ']'
>> " - " >> double_ >> " s"
>> " => String: " >> raw[+graph]
>> eol;
}
private:
qi::rule<It, structs::Record2()> start;
};
template <typename It>
struct Parser3 : qi::grammar<It, structs::Records1()>
{
Parser3() : Parser3::base_type(start) {
using namespace qi;
using boost::phoenix::push_back;
line = '[' >> raw[*~char_(']')] >> ']'
>> " - " >> double_ >> " s"
>> " => String: " >> raw[+graph];
ignore = *~char_("\r\n");
start = (line[push_back(_val, _1)] | ignore) % eol;
}
private:
qi::rule<It> ignore;
qi::rule<It, structs::Record1()> line;
qi::rule<It, structs::Records1()> start;
};
template <typename It>
struct Parser4 : qi::grammar<It, structs::Records2()>
{
Parser4() : Parser4::base_type(start) {
using namespace qi;
using boost::phoenix::push_back;
line = '[' >> stream >> ']'
>> " - " >> double_ >> " s"
>> " => String: " >> raw[+graph];
ignore = *~char_("\r\n");
start = (line[push_back(_val, _1)] | ignore) % eol;
}
private:
qi::rule<It> ignore;
qi::rule<It, structs::Record2()> line;
qi::rule<It, structs::Records2()> start;
};
}
template<typename Parser, typename Container>
Container parse_seek(It b, It e, const std::string& message)
{
static const Parser parser;
Container records;
boost::chrono::high_resolution_clock::time_point t0 = boost::chrono::high_resolution_clock::now();
parse(b, e, *boost::spirit::repository::qi::seek[parser], records);
boost::chrono::high_resolution_clock::time_point t1 = boost::chrono::high_resolution_clock::now();
auto elapsed = boost::chrono::duration_cast<boost::chrono::milliseconds>(t1 - t0);
std::cout << "Elapsed time: " << elapsed.count() << " ms (" << message << ")\n";
return records;
}
template<typename Parser, typename Container>
Container parse_ignoring(It b, It e, const std::string& message)
{
static const Parser parser;
Container records;
boost::chrono::high_resolution_clock::time_point t0 = boost::chrono::high_resolution_clock::now();
parse(b, e, parser, records);
boost::chrono::high_resolution_clock::time_point t1 = boost::chrono::high_resolution_clock::now();
auto elapsed = boost::chrono::duration_cast<boost::chrono::milliseconds>(t1 - t0);
std::cout << "Elapsed time: " << elapsed.count() << " ms (" << message << ")\n";
return records;
}
static const std::string input1 = "[2018-Mar-01 00:00:00.000000] - 1.000 s => String: Valid_string\n";
static const std::string input2 = "[2018-Mar-02 00:00:00.000000] - 2.000 s => I dont care\n";
static std::string input("");
int main() {
const int N1 = 10;
const int N2 = 100000;
input.reserve(N1 * (input1.size() + N2*input2.size()));
for (int i = N1; i--;)
{
input += input1;
for (int j = N2; j--;)
input += input2;
}
const auto records1 = parse_seek<QiParsers::Parser1<It>, structs::Records1>(input.begin(), input.end(), "std::string + seek");
const auto records2 = parse_seek<QiParsers::Parser2<It>, structs::Records2>(input.begin(), input.end(), "stream + seek");
const auto records3 = parse_ignoring<QiParsers::Parser3<It>, structs::Records1>(input.begin(), input.end(), "std::string + ignoring");
const auto records4 = parse_ignoring<QiParsers::Parser4<It>, structs::Records2>(input.begin(), input.end(), "stream + ignoring");
return 0;
}
控制台中的结果是:
Elapsed time: 1445 ms (std::string + seek)
Elapsed time: 21519 ms (stream + seek)
Elapsed time: 860 ms (std::string + ignoring)
Elapsed time: 19046 ms (stream + ignoring)
推荐答案
好的,在已发布的代码中,流的下溢操作花费了70%¹的时间.
Okay, in the code posted, 70%¹ of time is spent in the stream's underflow operation.
我没有研究/why/,但是²写了一些幼稚的实现来看看我是否可以做得更好.第一步:
I haven't looked into /why/ that is, but instead² wrote a few naive implementations to see whether I could do better. First steps:
²更新,自对其进行分析以来,我一直提供
² Update I've since analyzed it and provided a PR.
在此特定情况下,该PR产生的改进不会影响利润(请参见摘要)
The improvement created by that PR does not affect the bottom line in this particular case (see SUMMARY)
- 将
operator>>
放到Timestamp
(我们将不会使用它) - 将
'[' >> stream >> ']'
的所有实例替换为'[' >> raw[*~char_(']')] >> ']'
,以便我们将始终使用特征将迭代器范围转换为属性类型(std::string
或Timestamp
) - drop
operator>>
forTimestamp
(we won't be using that) - replace all instances of
'[' >> stream >> ']'
with the alternative'[' >> raw[*~char_(']')] >> ']'
so that we will always be using the trait to transform the iterator range into the attribute type (std::string
orTimestamp
)
现在,我们实现assign_to_attribute_from_iterators<structs::Timestamp, It>
特性:
template <typename It>
struct assign_to_attribute_from_iterators<structs::Timestamp, It, void> {
static inline void call(It f, It l, structs::Timestamp& time) {
boost::iostreams::stream<boost::iostreams::array_source> stream(f, l);
struct std::tm tm;
if (stream >> std::get_time(&tm, "%Y-%b-%d %H:%M:%S") >> time.ms)
time.date = std::mktime(&tm);
else throw "Parse failure";
}
};
使用callgrind进行分析:(单击放大)
Profiling with callgrind: (click for zoom)
它确实得到了很大的改善,这可能是因为我们假设基本的char缓冲区是连续的,而Spirit实施无法做出这种假设.我们在time_get
中花费了大约42%的时间.
It does improve considerably, probably because we make the assumption that the underlying char-buffer is contiguous, where the Spirit implementation cannot make that assumption. We spend ~42% of the time in time_get
.
大致来说,25%的时间用于语言环境,其中令人担忧的〜20%花在动态转换上:(
Roughly speaking, 25% of time is devoted to locale stuff, of which a worrying ~20% is spent doing dynamic casts :(
相同,但是重用静态流实例以查看它是否有显着差异:
Same, but reusing a static stream instance to see whether it makes a significant difference:
static boost::iostreams::stream<boost::iostreams::array_source> s_stream;
template <typename It>
struct assign_to_attribute_from_iterators<structs::Timestamp, It, void> {
static inline void call(It f, It l, structs::Timestamp& time) {
struct std::tm tm;
if (s_stream.is_open()) s_stream.close();
s_stream.clear();
boost::iostreams::array_source as(f, l);
s_stream.open(as);
if (s_stream >> std::get_time(&tm, "%Y-%b-%d %H:%M:%S") >> time.ms)
time.date = std::mktime(&tm);
else throw "Parse failure";
}
};
分析显示没有显着差异.
Profiling reveals is no significant difference).
让我们看看是否降低到C级水平可以减少区域设置带来的伤害:
Let's see if dropping to C-level reduces the locale hurt:
template <typename It>
struct assign_to_attribute_from_iterators<structs::Timestamp, It, void> {
static inline void call(It f, It l, structs::Timestamp& time) {
struct std::tm tm;
auto remain = strptime(&*f, "%Y-%b-%d %H:%M:%S", &tm);
time.date = std::mktime(&tm);
#if __has_include(<charconv>) || __cpp_lib_to_chars >= 201611
auto result = std::from_chars(&*f, &*l, time.ms); // using <charconv> from c++17
#else
char* end;
time.ms = std::strtod(remain, &end);
assert(end > remain);
static_cast<void>(l); // unused
#endif
}
};
如您所见,在这里使用
strtod
有点次优.输入范围是有界的,但是没有办法告诉strtod
.我无法介绍from_chars
方法,因为它没有此问题,因此这绝对是更安全的方法.
As you can see, using
strtod
is a bit suboptimal here. The input range is bounded, but there's no way to tellstrtod
about that. I have not been able to profile thefrom_chars
approach, which is strictly safer because it doesn't have this issue.
实际上,对于您的示例代码,使用strtod
是安全的,因为我们知道输入缓冲区是NUL终止的.
In practice for your sample code it is safe to use strtod
because we know the input buffer is NUL-terminated.
在这里您可以看到解析日期时间仍然是一个值得关注的因素:
Here you can see that parsing the date-time is still a factor of concern:
- mktime 15.58%
- strptime 40.54%
- strtod 5.88%
但是总的来说,现在的区别不再那么令人震惊了
But all in all the difference is less egregious now:
- 分析器1:14.17%
- 解析器2:43.44%
- Parser3:5.69%
- 分析器4:35.49%
有趣的是,低级" C-API的性能与使用更高级别的Boost posix_time::ptime
函数相距不远:
Interestingly, the performance of the "lowlevel" C-APIs is not far from using the much more highlevel Boost posix_time::ptime
functions:
template <typename It>
struct assign_to_attribute_from_iterators<structs::Timestamp, It, void> {
static inline void call(It f, It l, structs::Timestamp& time) {
time.date = to_time_t(boost::posix_time::time_from_string(std::string(f,l)));
}
};
这可能会牺牲一些精度,根据到文档:
在这里,花费在解析日期和时间上的总时间为68%.解析器的相对速度接近最后一个:
Here, the total time spent parsing date and time is 68%. The relative speeds of the parsers are close to the last ones:
- 分析器1:12.33%
- 解析器2:43.86%
- Parser3:5.22%
- 分析器4:37.43%
总而言之,事实证明,即使您冒分配更多的风险,存储字符串的速度也似乎更快.我已经做了非常简单的检查,是否可以归结为
All in all, it turns out that storing the strings seems faster, even if you risk allocating more. I've done a very simple check whether this could be down to SSO by increasing the length of the substring:
static const std::string input1 = "[2018-Mar-01 00:01:02.012345 THWARTING THE SMALL STRING OPTIMIZATION HERE THIS WON'T FIT, NO DOUBT] - 1.000 s => String: Valid_string\n";
static const std::string input2 = "[2018-Mar-02 00:01:02.012345 THWARTING THE SMALL STRING OPTIMIZATION HERE THIS WON'T FIT, NO DOUBT] - 2.000 s => I dont care\n";
没有重大影响,因此可以自行分析.
There was no significant impact, so that leaves the parsing itself.
显然,您要么希望延迟解析时间(到目前为止,Parser3
最快),要么应该与经过时间考验的Boost posix_time
函数一起使用.
It seems clear that either you will want to delay parsing the time (Parser3
is by far the quickest) or should go with the time-tested Boost posix_time
functions.
这是我使用的组合基准代码.几件事改变了:
Here's the combined benchmark code I used. A few things changed:
- 添加了一些健全性检查输出(以避免测试不必要的代码)
- 使迭代器具有通用性(更改为
char*
不会对优化版本中的性能产生重大影响) - 通过在正确的位置将
#if 1
更改为#if 0
,上述变体都可以在代码中手动切换 - 为方便起见减少了N1/N2
- added some sanity check output (to avoid testing nonsensical code)
- made the iterator generic (changing to
char*
has no significant effect on performance in optimized builds) - the above variants are all manually switchable in the code by changing
#if 1
to#if 0
in the right spots - reduced N1/N2 for convenience
我自由使用了C ++ 14,因为代码的目的是发现瓶颈.分析后,获得的任何智慧都可以相对容易地向后移植.
I've liberally used C++14 because the purpose of the code was to find bottlenecks. Any wisdom gained can be backported relatively easily after the profiling.
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>
#include <boost/chrono/chrono.hpp>
#include <iomanip>
#include <ctime>
#if __has_include(<charconv>) || __cpp_lib_to_chars >= 201611
# include <charconv> // not supported yet until GCC 8
#endif
namespace structs {
struct Timestamp {
std::time_t date;
double ms;
};
struct Record1 {
std::string date;
double time;
std::string str;
};
struct Record2 {
Timestamp date;
double time;
std::string str;
};
typedef std::vector<Record1> Records1;
typedef std::vector<Record2> Records2;
}
BOOST_FUSION_ADAPT_STRUCT(structs::Record1,
(std::string, date)
(double, time)
(std::string, str))
BOOST_FUSION_ADAPT_STRUCT(structs::Record2,
(structs::Timestamp, date)
(double, time)
(std::string, str))
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<std::string, It, void> {
static inline void call(It f, It l, std::string& attr) {
attr = std::string(&*f, std::distance(f,l));
}
};
static boost::iostreams::stream<boost::iostreams::array_source> s_stream;
template <typename It>
struct assign_to_attribute_from_iterators<structs::Timestamp, It, void> {
static inline void call(It f, It l, structs::Timestamp& time) {
#if 1
time.date = to_time_t(boost::posix_time::time_from_string(std::string(f,l)));
#elif 1
struct std::tm tm;
boost::iostreams::stream<boost::iostreams::array_source> stream(f, l);
if (stream >> std::get_time(&tm, "%Y-%b-%d %H:%M:%S") >> time.ms)
time.date = std::mktime(&tm);
else
throw "Parse failure";
#elif 1
struct std::tm tm;
if (s_stream.is_open()) s_stream.close();
s_stream.clear();
boost::iostreams::array_source as(f, l);
s_stream.open(as);
if (s_stream >> std::get_time(&tm, "%Y-%b-%d %H:%M:%S") >> time.ms)
time.date = std::mktime(&tm);
else
throw "Parse failure";
#else
struct std::tm tm;
auto remain = strptime(&*f, "%Y-%b-%d %H:%M:%S", &tm);
time.date = std::mktime(&tm);
#if __has_include(<charconv>) || __cpp_lib_to_chars >= 201611
auto result = std::from_chars(&*f, &*l, time.ms); // using <charconv> from c++17
#else
char* end;
time.ms = std::strtod(remain, &end);
assert(end > remain);
static_cast<void>(l); // unused
#endif
#endif
}
};
} } }
namespace qi = boost::spirit::qi;
namespace QiParsers {
template <typename It>
struct Parser1 : qi::grammar<It, structs::Record1()>
{
Parser1() : Parser1::base_type(start) {
using namespace qi;
start = '[' >> raw[*~char_(']')] >> ']'
>> " - " >> double_ >> " s"
>> " => String: " >> raw[+graph]
>> eol;
}
private:
qi::rule<It, structs::Record1()> start;
};
template <typename It>
struct Parser2 : qi::grammar<It, structs::Record2()>
{
Parser2() : Parser2::base_type(start) {
using namespace qi;
start = '[' >> raw[*~char_(']')] >> ']'
>> " - " >> double_ >> " s"
>> " => String: " >> raw[+graph]
>> eol;
}
private:
qi::rule<It, structs::Record2()> start;
};
template <typename It>
struct Parser3 : qi::grammar<It, structs::Records1()>
{
Parser3() : Parser3::base_type(start) {
using namespace qi;
using boost::phoenix::push_back;
line = '[' >> raw[*~char_(']')] >> ']'
>> " - " >> double_ >> " s"
>> " => String: " >> raw[+graph];
ignore = *~char_("\r\n");
start = (line[push_back(_val, _1)] | ignore) % eol;
}
private:
qi::rule<It> ignore;
qi::rule<It, structs::Record1()> line;
qi::rule<It, structs::Records1()> start;
};
template <typename It>
struct Parser4 : qi::grammar<It, structs::Records2()>
{
Parser4() : Parser4::base_type(start) {
using namespace qi;
using boost::phoenix::push_back;
line = '[' >> raw[*~char_(']')] >> ']'
>> " - " >> double_ >> " s"
>> " => String: " >> raw[+graph];
ignore = *~char_("\r\n");
start = (line[push_back(_val, _1)] | ignore) % eol;
}
private:
qi::rule<It> ignore;
qi::rule<It, structs::Record2()> line;
qi::rule<It, structs::Records2()> start;
};
}
template <typename Parser> static const Parser s_instance {};
template<template <typename> class Parser, typename Container, typename It>
Container parse_seek(It b, It e, const std::string& message)
{
Container records;
auto const t0 = boost::chrono::high_resolution_clock::now();
parse(b, e, *boost::spirit::repository::qi::seek[s_instance<Parser<It> >], records);
auto const t1 = boost::chrono::high_resolution_clock::now();
auto elapsed = boost::chrono::duration_cast<boost::chrono::milliseconds>(t1 - t0);
std::cout << "Elapsed time: " << elapsed.count() << " ms (" << message << ")\n";
return records;
}
template<template <typename> class Parser, typename Container, typename It>
Container parse_ignoring(It b, It e, const std::string& message)
{
Container records;
auto const t0 = boost::chrono::high_resolution_clock::now();
parse(b, e, s_instance<Parser<It> >, records);
auto const t1 = boost::chrono::high_resolution_clock::now();
auto elapsed = boost::chrono::duration_cast<boost::chrono::milliseconds>(t1 - t0);
std::cout << "Elapsed time: " << elapsed.count() << " ms (" << message << ")\n";
return records;
}
static const std::string input1 = "[2018-Mar-01 00:01:02.012345] - 1.000 s => String: Valid_string\n";
static const std::string input2 = "[2018-Mar-02 00:01:02.012345] - 2.000 s => I dont care\n";
std::string prepare_input() {
std::string input;
const int N1 = 10;
const int N2 = 1000;
input.reserve(N1 * (input1.size() + N2*input2.size()));
for (int i = N1; i--;) {
input += input1;
for (int j = N2; j--;)
input += input2;
}
return input;
}
int main() {
auto const input = prepare_input();
auto f = input.data(), l = f + input.length();
for (auto& r: parse_seek<QiParsers::Parser1, structs::Records1>(f, l, "std::string + seek")) {
std::cout << r.date << "\n";
break;
}
for (auto& r: parse_seek<QiParsers::Parser2, structs::Records2>(f, l, "stream + seek")) {
auto tm = *std::localtime(&r.date.date);
std::cout << std::put_time(&tm, "%Y-%b-%d %H:%M:%S") << "\n";
break;
}
for (auto& r: parse_ignoring<QiParsers::Parser3, structs::Records1>(f, l, "std::string + ignoring")) {
std::cout << r.date << "\n";
break;
}
for (auto& r: parse_ignoring<QiParsers::Parser4, structs::Records2>(f, l, "stream + ignoring")) {
auto tm = *std::localtime(&r.date.date);
std::cout << std::put_time(&tm, "%Y-%b-%d %H:%M:%S") << "\n";
break;
}
}
打印类似内容
Elapsed time: 14 ms (std::string + seek)
2018-Mar-01 00:01:02.012345
Elapsed time: 29 ms (stream + seek)
2018-Mar-01 00:01:02
Elapsed time: 2 ms (std::string + ignoring)
2018-Mar-01 00:01:02.012345
Elapsed time: 22 ms (stream + ignoring)
2018-Mar-01 00:01:02
¹所有百分比均相对于总计划费用. 确实使百分比倾斜(如果不考虑非流解析器测试,则提到的70%甚至会更糟),但是这些数字足以作为相对比较的指南.在测试中.
¹ all percentages are relative to total program cost. That does skew the percentages (the 70% mentioned would be even worse if the non-stream parser tests weren't taken into account), but the numbers are a good enough guide for relative comparions within a test run.
这篇关于为什么在增强精神中使用流会大大降低性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!