`std :: variant` vs.继承vs.其他方式(性能) [英] `std::variant` vs. inheritance vs. other ways (performance)

查看:152
本文介绍了`std :: variant` vs.继承vs.其他方式(性能)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道 std :: variant 的性能。什么时候不应该使用它?似乎虚拟函数仍然比使用 std :: visit 更好,这让我感到惊讶!



C ++之旅 Bjarne Stroustrup在解释了 std :: holds_alternatives 模式检查的内容。 >重载方法:


这基本上等效于虚拟函数调用,但可能更快。与所有
性能的声明一样,当性能为
时,应该通过测量来验证可能更快。对于大多数用途而言,性能上的差异并不明显。


我已经基准测试过一些我想到的方法,这些都是结果:









http:// quick -bench.com/EmhM-S-xoA0LABYK6yrMyBb8UeI



http://quick-bench.com/5hBZprSRIRGuDaBZ_wj0cOwnNhw (已删除了虚拟,因此您可以更好地比较其余虚拟机)






更新(5)



如Jorge Bellon 在评论中,我没有考虑分配成本;所以我将每个基准转换为使用指针。这种间接影响当然会影响性能,但现在更加公平了。所以现在循环中没有分配。



这里是代码:



删除了旧的码;看看更新



到目前为止,我已经运行了一些基准测试。看来g ++在优化代码方面做得更好:

  ------------- -------------------------------------------------- ---- 
基准时间CPU迭代
---------------------------------- ---------------------------------
EmptyRandom 0.756 ns 0.748 ns 746067433
TradeSpaceForPerformance 2.87 ns 2.86 ns 243756914
虚拟12.5 ns 12.4 ns 60757698
索引7.85 ns 7.81 ns 99243512
GetIf 8.20 ns 8.18 ns 92393200
持有7.08 ns 7.07 ns 96959764
ConstexprVisitor 11.3 ns 11.2 ns 60152725
StructVisitor 10.7 ns 10.6 ns 60254088
重载10.3 ns 10.3 ns 58591608

并且铛声:

  ------------------------ ------------------------------------------- 
基准时间CPU迭代
--------------------------------------------- ----------------------
EmptyRandom 1.99 ns 1.99 ns 310094223
TradeSpaceForPerformance 8.82 ns 8.79 ns 87695977
虚拟12.9 ns 12.8 ns 51913962
索引13.9 ns 13.8 ns 52987698
GetIf 15.1 ns 15.0 ns 48578587
Holds替代13.1 ns 13.1 ns 51711783
ConstexprVisitor 13.8 ns 13.8 ns 49120024
StructVisitor 14.5 ns 14.5 ns 52679532
重载17.1 ns 17.1 ns 42553366

现在,对于c来说,最好使用虚拟继承但是对于g ++,最好使用 holds_alternative get_if ,但总体而言, std :: visit 到目前为止,对于我几乎所有的基准测试来说,似乎都不是一个好选择。



我认为,如果模式匹配(开关语句不仅可以检查整数文字,而且还可以检查更多内容),我们将编写更整洁,更可维护的代码。



想知道 package.index()结果。不应该更快吗?



Clang版本: http:// quick-bench.com/cl0HFmUes2GCSE1w04qt4Rqj6aI



使用一个而不是<$ c的版本$ c> auto one = new One 基于 Maxim Egorushkin的评论 http://quick-bench.com/KAeT00__i2zbmpmUHDutAfiD6-问(不会对结果产生太大影响)






更新(6)



我进行了一些更改,现在每个编译器的结果都大不相同。但是看来 std :: get_if std :: holds_alternatives 是最好的解决方案。 虚拟似乎由于未知原因而在使用clang时效果最好。那真的让我感到惊讶,因为我记得虚拟在gcc方面表现更好。而且 std :: visit 完全没有竞争;在最后一个基准中,它甚至比对vtable的查询还要糟糕。



这里是基准(使用GCC / Clang以及libstdc ++和libc ++运行):



http://quick-bench.com/LhdP-9y6CqwGxB- WtDlbG27o_5Y

  #include< benchmark / benchmark.h> 

#include< array>
#include< variant>
#include< random>
#include< functional>
#include< algorithm>

使用命名空间std;

struct One {
auto get()const {return 1; }
};
struct两个{
auto get()const {返回2; }
};
结构三{
auto get()const {返回3; }
};
struct四{
auto get()const {返回4; }
};

template< class ... Ts>结构重载:Ts ... {使用Ts :: operator()...; };
template< class ... Ts>超载(Ts ...)->超载< Ts ...> ;;


std :: random_device开发人员;
std :: mt19937 rng(dev());
std :: uniform_int_distribution< std :: mt19937 :: result_type> random_pick(0,3); //分布在[1,6]范围内

模板< std :: size_t N>
std :: array< int,N> get_random_array(){
std :: array< int,N>项目;
for(int i = 0; i< N; i ++)
item [i] = random_pick(rng);
退货项目;
}

模板< typename T,std :: size_t N>
std :: array< T,N> get_random_objects(std :: function< T(decltype(random_pick(rng)))> func){
std :: array< T,N>一种;
std :: generate(a.begin(),a.end(),[&] {
return func(random_pick(rng));
});
返回a;
}


静态无效TradeSpaceForPerformance(benchmark :: State& state){
一个;
两两;
三三;
四四;

int索引= 0;

auto ran_arr = get_random_array< 50>();
int r = 0;

auto pick_randomly = [&](){
index = ran_arr [r ++%ran_arr.size()];
};

pick_randomly();


for(auto _:state){

int res;
switch(index){
case 0:
res = one.get();
休息时间;
情况1:
res = two.get();
休息时间;
情况2:
res = three.get();
休息时间;
情况3:
res = four.get();
休息时间;
}

基准::: DoNotOptimize(index);
Benchmark :: DoNotOptimize(res);

pick_randomly();
}


}
//将函数注册为基准
BENCHMARK(TradeSpaceForPerformance);


静态无效Virtual(benchmark :: State& state){

struct Base {
virtual int get()const noexcept = 0;
virtual〜Base(){}
};

struct A final:public Base {
int get()const noexceptoverride {return 1; }
};

结构B最终:public Base {
int get()const noexcept Override {return 2; }
};

struct C final:public Base {
int get()const noexcept Override {return 3; }
};

结构D最终:public Base {
int get()const noexcept Override {return 4; }
};

Base *包= nullptr;
int r = 0;
自动封装= get_random_objects< Base *,50>([&](auto r)-> Base * {
switch(r){
case 0:return new A;
情况1:返回新的B;
情况3:返回新的C;
情况4:返回新的D;
默认值:返回新的C;
}
});

auto pick_randomly = [&](){
package = packages [r ++%packages.size()];
};

pick_randomly();

for(auto _:state){

int res = package-> get();

Benchmark :: DoNotOptimize(package);
Benchmark :: DoNotOptimize(res);

pick_randomly();
}


for(auto& i:package)
delete i;

}
基准(虚拟);




静态无效FunctionPointerList(benchmark :: State& state){

一个;
两两;
三三;
四四;
使用type = std :: function< int()> ;;
std :: size_t索引;

自动封装= get_random_objects< type,50>([&](auto r)-> type {
switch(r){
case 0:return std: :bind(& One :: get,一个);
情况1:返回std :: bind(& Two :: get,两个);
情况2:返回std :: bind(& ; Three :: get,三个);
情况3:返回std :: bind(& Four :: get,四个);
默认值:返回std :: bind(& Three :: get ,三个);
}
});
int r = 0;

auto pick_randomly = [&](){
index = r ++%packages.size();
};


pick_randomly();

for(auto _:state){

int res = packages [index]();

基准::: DoNotOptimize(index);
Benchmark :: DoNotOptimize(res);

pick_randomly();
}

}
BENCHMARK(FunctionPointerList);



静态无效指数(benchmark :: State& state){

一个;
两两;
三三;
四四;
使用type = std :: variant< 1、2、3、4> ;;
type * package = nullptr;

自动包装= get_random_objects< type,50>([&](auto r)-> type {
switch(r){
case 0:返回一个;
情况1:返回2;
情况2:返回3;
情况3:返回4;
默认值:返回3;
}
}) ;
int r = 0;

auto pick_randomly = [&](){
package =& packages [r ++%packages.size()];
};


pick_randomly();

for(auto _:state){

int res;
开关(package-> index()){
case 0:
res = std :: get< One>(* package).get();
休息时间;
情况1:
res = std :: get< Two>(* package).get();
休息时间;
情况2:
res = std :: get< Three>(* package).get();
休息时间;
情况3:
res = std :: get< Four>(* package).get();
休息时间;
}

基准::: DoNotOptimize(package);
Benchmark :: DoNotOptimize(res);

pick_randomly();
}

}
基准(索引);



静态void GetIf(benchmark :: State& state){
一个;
两两;
三三;
四四;
使用type = std :: variant< 1、2、3、4> ;;
type * package = nullptr;

自动包装= get_random_objects< type,50>([&](auto r)-> type {
switch(r){
case 0:返回一个;
情况1:返回2;
情况2:返回3;
情况3:返回4;
默认值:返回3;
}
}) ;
int r = 0;

auto pick_randomly = [&](){
package =& packages [r ++%packages.size()];
};

pick_randomly();

for(auto _:state){

int res;
if(auto item = std :: get_if< One>(package)){
res = item-> get();
}否则,如果(auto item = std :: get_if< Two>(package)){
res = item-> get();
}否则,如果(auto item = std :: get_if< Three>(package)){
res = item-> get();
}否则,如果(auto item = std :: get_if< Four>(package)){
res = item-> get();
}

基准::: DoNotOptimize(package);
Benchmark :: DoNotOptimize(res);

pick_randomly();
}


}
BENCHMARK(GetIf);

静态void HoldsAlternative(benchmark :: State& state){
一个;
两两;
三三;
四四;
使用type = std :: variant< 1、2、3、4> ;;
type * package = nullptr;

自动包装= get_random_objects< type,50>([&](auto r)-> type {
switch(r){
case 0:返回一个;
情况1:返回2;
情况2:返回3;
情况3:返回4;
默认值:返回3;
}
}) ;
int r = 0;

auto pick_randomly = [&](){
package =& packages [r ++%packages.size()];
};

pick_randomly();

for(auto _:state){

int res;
if(std :: holds_alternative< One>(* package)){
res = std :: get< One>(* package).get();
}否则if(std :: holds_alternative< Two>(* package)){
res = std :: get< Two>(* package).get();
}否则if(std :: holds_alternative< Three>(* package)){
res = std :: get< Three>(* package).get();
}否则,如果(std :: holds_alternative< Four>(* package)){
res = std :: get< Four>(* package).get();
}

基准::: DoNotOptimize(package);
Benchmark :: DoNotOptimize(res);

pick_randomly();
}

}
基准(HoldsAlternative);


静态无效ConstexprVisitor(benchmark :: State& state){

一个;
两两;
三三;
四四;
使用type = std :: variant< 1、2、3、4> ;;
type * package = nullptr;

自动包装= get_random_objects< type,50>([&](auto r)-> type {
switch(r){
case 0:返回一个;
情况1:返回2;
情况2:返回3;
情况3:返回4;
默认值:返回3;
}
}) ;
int r = 0;

auto pick_randomly = [&](){
package =& packages [r ++%packages.size()];
};

pick_randomly();

auto func = [](auto const& ref){
using type = std :: decay_t< decltype(ref)> ;;
如果constexpr(std :: is_same< type,One> :: value){
return ref.get();
}否则,如果constexpr(std :: is_same< type,Two> :: value){
return ref.get();
}否则,如果constexpr(std :: is_same< type,Three> :: value){
return ref.get();
}否则,如果constexpr(std :: is_same< type,Four> :: value){
return ref.get();
}否则{
返回0;
}
};

for(auto _:state){

auto res = std :: visit(func,* package);

Benchmark :: DoNotOptimize(package);
Benchmark :: DoNotOptimize(res);

pick_randomly();
}

}
基准(ConstexprVisitor);

静态void StructVisitor(benchmark :: State& state){



struct VisitPackage
{
自动运算符( )(一个const& r){return r.get(); }
auto operator()(两个const& r){return r.get(); }
auto operator()(三个const& r){return r.get(); }
auto operator()(四个const& r){return r.get(); }
};

一对一个;
两两;
三三;
四四;
使用type = std :: variant< 1、2、3、4> ;;
type * package = nullptr;

自动包装= get_random_objects< type,50>([&](auto r)-> type {
switch(r){
case 0:返回一个;
情况1:返回2;
情况2:返回3;
情况3:返回4;
默认值:返回3;
}
}) ;
int r = 0;

auto pick_randomly = [&](){
package =& packages [r ++%packages.size()];
};

pick_randomly();

自动vs = VisitPackage();

for(auto _:state){

auto res = std :: visit(vs,* package);

Benchmark :: DoNotOptimize(package);
Benchmark :: DoNotOptimize(res);

pick_randomly();
}

}
基准(StructVisitor);


静态void过载(基准::州和州){



两两;
三三;
四四;
使用type = std :: variant< 1、2、3、4> ;;
type * package = nullptr;

自动包装= get_random_objects< type,50>([&](auto r)-> type {
switch(r){
case 0:返回一个;
情况1:返回2;
情况2:返回3;
情况3:返回4;
默认值:返回3;
}
}) ;
int r = 0;

auto pick_randomly = [&](){
package =& packages [r ++%packages.size()];
};

pick_randomly();

auto ov =过载{
[](一个const& r){return r.get(); },
[](两个const& r){return r.get(); },
[](三个const& r){return r.get(); },
[](四个const& r){return r.get(); }
};

for(auto _:state){

auto res = std :: visit(ov,* package);


基准:: DoNotOptimize(package);
Benchmark :: DoNotOptimize(res);

pick_randomly();
}

}
基准(超载);


// BENCHMARK_MAIN();

GCC编译器的结果:

  ------------------------------------------ ------------------------- 
基准时间CPU迭代
------------- -------------------------------------------------- ----
TradeSpaceForPerformance 3.71 ns 3.61 ns 170515835
虚拟12.20 ns 12.10 ns 55911685
FunctionPointerList 13.00 ns 12.90 ns 50763964
索引7.40 ns 7.38 ns 136228156
GetIf 4.04 ns 4.02 ns 205214632
HoldsAlternative 3.74 ns 3.73 ns 200278724
ConstexprVisitor 12.50 ns 12.40 ns 56373704
StructVisitor 12.00 ns 12.00 ns 60866510
过载13.20 ns 13.20 ns 56128558

结果对于clang编译器(令我感到惊讶):

  ------------- -------------------------------------------------- ---- 
基准时间CPU迭代
---------------------------------- ---------------------------------
TradeSpaceForPerformance 8.07 ns 7.99 ns 77530258
虚拟7.80 ns 7.77 ns 77301370
FunctionPointerList 12.1 ns 12.1 ns 56363372
Index 11.1 ns 11.1 ns 69582297
GetIf 10.4 ns 10.4 ns 80923874
HoldIf替代9.98 ns 9.96 ns 71313572
ConstexprVisitor 11.4 ns 11.3 ns 63267967
StructVisitor 10.8 ns 10.7 ns 65477522
重载11.4 ns 11.4 ns 64880956



< hr>

最佳到目前为止的基准测试(将被更新):
http://quick-bench.com/ LhdP-9y6CqwGxB-WtDlbG27o_5Y (另请参阅GCC)

解决方案

std :: visit 似乎在某些实现上缺少一些优化。话虽这么说,但在这个类似实验室的设置中并没有很好地看到一个中心点-即的设计是基于堆栈的,而虚拟模式,自然会倾向于基于堆。在现实世界中,这意味着内存布局很可能会碎片化(也许随着时间的流逝-一旦对象离开缓存等),除非可以避免这种情况。相反的是 variant 的问题的设计,可以进行布局在连续的记忆中。我认为这是在考虑到性能时不可低估的极其重要点。



为说明这一点,请考虑以下内容:

  std ::向量< Base *> runtime_poly _; //碎片风险

vs。

  std :: vector< my_var_type> cp_time_poly _; //没有碎片(但是填充了风险)

这种碎片很难内置像这样的基准测试。
如果在bjarne所说的话中(也是)我也不清楚,当时他说这可能会更快(我相信这是对的)。



基于 std :: variant 的设计要记住的另一项非常重要要点是,每个元素的大小都会占用最大可能的元素。因此,如果对象的大小不大致相同,则必须仔细考虑,因为这可能会对缓存产生不利影响。



将这些点综合考虑很难说哪种是在一般情况下最好使用的-但是,如果集合是大小大致相同的封闭较小集合,则应该足够清楚-那么变体样式显示出更快的巨大潜力(如bjarne所述) 。



我们现在只考虑性能,确实有其他原因选择一种或另一种模式:最后,您只需要离开即可。


I'm wondering about std::variant performance. When should I not use it? It seems like virtual functions are still much better than using std::visit which surprised me!

In "A Tour of C++" Bjarne Stroustrup says this about pattern checking after explaining std::holds_alternatives and the overloaded methods:

This is basically equivalent to a virtual function call, but potentially faster. As with all claims of performance, this ‘‘potentially faster’’ should be verified by measurements when performance is critical. For most uses, the difference in performance is insignificant.

I've benchmark some methods that came in my mind and these are the results:

http://quick-bench.com/N35RRw_IFO74ZihFbtMu4BIKCJg

You'll get a different result if you turn on the optimization:

http://quick-bench.com/p6KIUtRxZdHJeiFiGI8gjbOumoc

Here's the code I've used for benchmarks; I'm sure there's better way to implement and use variants for using them instead of virtual keywords (inheritance vs. std::variant):

removed the old code; look at the updates

Can anyone explain what is the best way to implement this use case for std::variant that got me to testing and benchmarking:

I'm currently implementing RFC 3986 which is 'URI' and for my use case this class will be used more as a const and probably won't be changed a lot and it's more likely for the user to use this class to find each specific portion of the URI rather than making a URI; so it made sense to make use of std::string_view and not separating each segment of the URI in its own std::string. The problem was I needed to implement two classes for it; one for when I only need a const version; and another one for when the user wants to create the URI rather than providing one and searching through it.

So I used a template to fix that which had its own problems; but then I realized I could use std::variant<std::string, std::string_view> (or maybe std::variant<CustomStructHoldingAllThePieces, std::string_view>); so I started researching to see if it actually helps to use variants or not. From these results, it seems like using inheritance and virtual is my best bet if I don't want to implement two different const_uri and uri classes.

What do you think should I do?


Update (2)

Thanks for @gan_ for mentioning and fixing the hoisting problem in my benchmark code. http://quick-bench.com/Mcclomh03nu8nDCgT3T302xKnXY

I was surprised with the result of try-catch hell but thanks to this comment that makes sense now.

Update (3)

I removed the try-catch method as it was really bad; and also randomly changed the selected value and by the looks of it, I see more realistic benchmark. It seems like virtual is not the correct answer after all. http://quick-bench.com/o92Yrt0tmqTdcvufmIpu_fIfHt0

http://quick-bench.com/FFbe3bsIpdFsmgKfm94xGNFKVKs (without the memory leak lol)

Update (4)

I removed the overhead of generating random numbers (I've already did that in the last update but it seems like I had grabbed the wrong URL for benchmark) and added an EmptyRandom for understanding the baseline of generating random numbers. And also made some small changes in Virtual but I don't think it affected anything. http://quick-bench.com/EmhM-S-xoA0LABYK6yrMyBb8UeI

http://quick-bench.com/5hBZprSRIRGuDaBZ_wj0cOwnNhw (removed the Virtual so you could compare the rest of them better)


Update (5)

as Jorge Bellon said in the comments, I wasn't thinking about the cost of allocation; so I converted every benchmark to use pointers. This indirection has an impact on performance of course but it's more fair now. So right now there's no allocation in the loops.

Here's the code:

removed the old code; look at the updates

I ran some benchmarks so far. It seems like g++ does a better job of optimizing the code:

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
EmptyRandom                   0.756 ns        0.748 ns    746067433
TradeSpaceForPerformance       2.87 ns         2.86 ns    243756914
Virtual                        12.5 ns         12.4 ns     60757698
Index                          7.85 ns         7.81 ns     99243512
GetIf                          8.20 ns         8.18 ns     92393200
HoldsAlternative               7.08 ns         7.07 ns     96959764
ConstexprVisitor               11.3 ns         11.2 ns     60152725
StructVisitor                  10.7 ns         10.6 ns     60254088
Overload                       10.3 ns         10.3 ns     58591608

And for clang:

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
EmptyRandom                    1.99 ns         1.99 ns    310094223
TradeSpaceForPerformance       8.82 ns         8.79 ns     87695977
Virtual                        12.9 ns         12.8 ns     51913962
Index                          13.9 ns         13.8 ns     52987698
GetIf                          15.1 ns         15.0 ns     48578587
HoldsAlternative               13.1 ns         13.1 ns     51711783
ConstexprVisitor               13.8 ns         13.8 ns     49120024
StructVisitor                  14.5 ns         14.5 ns     52679532
Overload                       17.1 ns         17.1 ns     42553366

Right now, for clang, it's better to use virtual inheritance but for g++ it's better to use holds_alternative or get_if but in overall, std::visit seems to be not a good choice for almost all of my benchmarks so far.

I'm thinking it'll be a good idea if pattern matching (switch statements capable of checking more stuff than just integer literals) would be added to the c++, we would be writing cleaner and more maintainable code.

I'm wondering about the package.index() results. Shouldn't it be faster? what does it do?

Clang version: http://quick-bench.com/cl0HFmUes2GCSE1w04qt4Rqj6aI

The version that uses One one instead of auto one = new One based on Maxim Egorushkin's comment: http://quick-bench.com/KAeT00__i2zbmpmUHDutAfiD6-Q (not changing the outcome much)


Update (6)

I made some changes and the results are very different from compiler to compiler now. But it seems like std::get_if and std::holds_alternatives are the best solutions. virtual seems to work best for unknown reasons with clang now. That really surprises me there because I remember virtual being better in gcc. And also std::visit is totally out of competition; in this last benchmark it's even worse than vtable lookup.

Here's the benchmark (run it with GCC/Clang and also with libstdc++ and libc++):

http://quick-bench.com/LhdP-9y6CqwGxB-WtDlbG27o_5Y

#include <benchmark/benchmark.h>

#include <array>
#include <variant>
#include <random>
#include <functional>
#include <algorithm>

using namespace std;

struct One {
  auto get () const { return 1; }
 };
struct Two {
  auto get() const { return 2; }
 };
struct Three { 
  auto get() const { return 3; }
};
struct Four {
  auto get() const { return 4; }
 };

template<class... Ts> struct overload : Ts... { using Ts::operator()...; };
template<class... Ts> overload(Ts...) -> overload<Ts...>;


std::random_device dev;
std::mt19937 rng(dev());
std::uniform_int_distribution<std::mt19937::result_type> random_pick(0,3); // distribution in range [1, 6]

template <std::size_t N>
std::array<int, N> get_random_array() {
  std::array<int, N> item;
  for (int i = 0 ; i < N; i++)
    item[i] = random_pick(rng);
  return item;
}

template <typename T, std::size_t N>
std::array<T, N> get_random_objects(std::function<T(decltype(random_pick(rng)))> func) {
    std::array<T, N> a;
    std::generate(a.begin(), a.end(), [&] {
        return func(random_pick(rng));
    });
    return a;
}


static void TradeSpaceForPerformance(benchmark::State& state) {
    One one;
    Two two;
    Three three;
    Four four;

  int index = 0;

  auto ran_arr = get_random_array<50>();
  int r = 0;

  auto pick_randomly = [&] () {
    index = ran_arr[r++ % ran_arr.size()];
  };

  pick_randomly();


  for (auto _ : state) {

    int res;
    switch (index) {
      case 0:
        res = one.get();
        break;
      case 1:
        res = two.get();
        break;
      case 2:
        res = three.get();
        break;
      case 3:
        res = four.get();
        break;
    }

    benchmark::DoNotOptimize(index);
    benchmark::DoNotOptimize(res);

    pick_randomly();
  }


}
// Register the function as a benchmark
BENCHMARK(TradeSpaceForPerformance);


static void Virtual(benchmark::State& state) {

  struct Base {
    virtual int get() const noexcept = 0;
    virtual ~Base() {}
  };

  struct A final: public Base {
    int get()  const noexcept override { return 1; }
  };

  struct B final : public Base {
    int get() const noexcept override { return 2; }
  };

  struct C final : public Base {
    int get() const noexcept override { return 3; }
  };

  struct D final : public Base {
    int get() const noexcept override { return 4; }
  };

  Base* package = nullptr;
  int r = 0;
  auto packages = get_random_objects<Base*, 50>([&] (auto r) -> Base* {
          switch(r) {
              case 0: return new A;
              case 1: return new B;
              case 3: return new C;
              case 4: return new D;
              default: return new C;
          }
    });

  auto pick_randomly = [&] () {
    package = packages[r++ % packages.size()];
  };

  pick_randomly();

  for (auto _ : state) {

    int res = package->get();

    benchmark::DoNotOptimize(package);
    benchmark::DoNotOptimize(res);

    pick_randomly();
  }


  for (auto &i : packages)
    delete i;

}
BENCHMARK(Virtual);




static void FunctionPointerList(benchmark::State& state) {

    One one;
    Two two;
    Three three;
    Four four;
  using type = std::function<int()>;
  std::size_t index;

  auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
        switch(r) {
        case 0: return std::bind(&One::get, one);
        case 1: return std::bind(&Two::get, two);
        case 2: return std::bind(&Three::get, three);
        case 3: return std::bind(&Four::get, four);
        default: return std::bind(&Three::get, three);
        }
    });
  int r = 0;

  auto pick_randomly = [&] () {
    index = r++ % packages.size();
  };


  pick_randomly();

  for (auto _ : state) {

    int res = packages[index]();

    benchmark::DoNotOptimize(index);
    benchmark::DoNotOptimize(res);

    pick_randomly();
  }

}
BENCHMARK(FunctionPointerList);



static void Index(benchmark::State& state) {

    One one;
    Two two;
    Three three;
    Four four;
  using type = std::variant<One, Two, Three, Four>;
  type* package = nullptr;

  auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
        switch(r) {
            case 0: return one;
            case 1: return two;
            case 2: return three;
            case 3: return four;
            default: return three;
        }
    });
  int r = 0;

  auto pick_randomly = [&] () {
    package = &packages[r++ % packages.size()];
  };


  pick_randomly();

  for (auto _ : state) {

    int res;
    switch (package->index()) {
      case 0: 
        res = std::get<One>(*package).get();
        break;
      case 1:
        res = std::get<Two>(*package).get();
        break;
      case 2:
        res = std::get<Three>(*package).get();
        break;
      case 3:
        res = std::get<Four>(*package).get();
        break;
    }

    benchmark::DoNotOptimize(package);
    benchmark::DoNotOptimize(res);

    pick_randomly();
  }

}
BENCHMARK(Index);



static void GetIf(benchmark::State& state) {
    One one;
    Two two;
    Three three;
    Four four;
  using type = std::variant<One, Two, Three, Four>;
  type* package = nullptr;

  auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
        switch(r) {
            case 0: return one;
            case 1: return two;
            case 2: return three;
            case 3: return four;
            default: return three;
        }
    });
  int r = 0;

  auto pick_randomly = [&] () {
    package = &packages[r++ % packages.size()];
  };

  pick_randomly();

  for (auto _ : state) {

    int res;
    if (auto item = std::get_if<One>(package)) {
      res = item->get();
    } else if (auto item = std::get_if<Two>(package)) {
      res = item->get();
    } else if (auto item = std::get_if<Three>(package)) {
      res = item->get();
    } else if (auto item = std::get_if<Four>(package)) {
      res = item->get();
    }

    benchmark::DoNotOptimize(package);
    benchmark::DoNotOptimize(res);

    pick_randomly();
  }


}
BENCHMARK(GetIf);

static void HoldsAlternative(benchmark::State& state) {
    One one;
    Two two;
    Three three;
    Four four;
  using type = std::variant<One, Two, Three, Four>;
  type* package = nullptr;

  auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
        switch(r) {
            case 0: return one;
            case 1: return two;
            case 2: return three;
            case 3: return four;
            default: return three;
        }
    });
  int r = 0;

  auto pick_randomly = [&] () {
    package = &packages[r++ % packages.size()];
  };

  pick_randomly();

  for (auto _ : state) {

    int res;
    if (std::holds_alternative<One>(*package)) {
      res = std::get<One>(*package).get();
    } else if (std::holds_alternative<Two>(*package)) {
      res = std::get<Two>(*package).get();
    } else if (std::holds_alternative<Three>(*package)) {
      res = std::get<Three>(*package).get();
    } else if (std::holds_alternative<Four>(*package)) {
      res = std::get<Four>(*package).get();
    }

    benchmark::DoNotOptimize(package);
    benchmark::DoNotOptimize(res);

    pick_randomly();
  }

}
BENCHMARK(HoldsAlternative);


static void ConstexprVisitor(benchmark::State& state) {

    One one;
    Two two;
    Three three;
    Four four;
  using type = std::variant<One, Two, Three, Four>;
  type* package = nullptr;

  auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
        switch(r) {
            case 0: return one;
            case 1: return two;
            case 2: return three;
            case 3: return four;
            default: return three;
        }
    });
  int r = 0;

  auto pick_randomly = [&] () {
    package = &packages[r++ % packages.size()];
  };

  pick_randomly();

  auto func = [] (auto const& ref) {
        using type = std::decay_t<decltype(ref)>;
        if constexpr (std::is_same<type, One>::value) {
            return ref.get();
        } else if constexpr (std::is_same<type, Two>::value) {
            return ref.get();
        } else if constexpr (std::is_same<type, Three>::value)  {
          return ref.get();
        } else if constexpr (std::is_same<type, Four>::value) {
            return ref.get();
        } else {
          return 0;
        }
    };

  for (auto _ : state) {

    auto res = std::visit(func, *package);

    benchmark::DoNotOptimize(package);
    benchmark::DoNotOptimize(res);

    pick_randomly();
  }

}
BENCHMARK(ConstexprVisitor);

static void StructVisitor(benchmark::State& state) {



  struct VisitPackage
  {
      auto operator()(One const& r) { return r.get(); }
      auto operator()(Two const& r) { return r.get(); }
      auto operator()(Three const& r) { return r.get(); }
      auto operator()(Four const& r) { return r.get(); }
  };

    One one;
    Two two;
    Three three;
    Four four;
  using type = std::variant<One, Two, Three, Four>;
  type* package = nullptr;

  auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
        switch(r) {
            case 0: return one;
            case 1: return two;
            case 2: return three;
            case 3: return four;
            default: return three;
        }
    });
  int r = 0;

  auto pick_randomly = [&] () {
    package = &packages[r++ % packages.size()];
  };

  pick_randomly();

  auto vs = VisitPackage();

  for (auto _ : state) {

    auto res = std::visit(vs, *package);

    benchmark::DoNotOptimize(package);
    benchmark::DoNotOptimize(res);

    pick_randomly();
  }

}
BENCHMARK(StructVisitor);


static void Overload(benchmark::State& state) {


    One one;
    Two two;
    Three three;
    Four four;
  using type = std::variant<One, Two, Three, Four>;
  type* package = nullptr;

  auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
        switch(r) {
            case 0: return one;
            case 1: return two;
            case 2: return three;
            case 3: return four;
            default: return three;
        }
    });
  int r = 0;

  auto pick_randomly = [&] () {
    package = &packages[r++ % packages.size()];
  };

  pick_randomly();

  auto ov = overload {
      [] (One const& r) { return r.get(); },
      [] (Two const& r) { return r.get(); },
      [] (Three const& r) { return r.get(); },
      [] (Four const& r) { return r.get(); }
    };

  for (auto _ : state) {

    auto res = std::visit(ov, *package);


    benchmark::DoNotOptimize(package);
    benchmark::DoNotOptimize(res);

    pick_randomly();
  }

}
BENCHMARK(Overload);


// BENCHMARK_MAIN();

Results for GCC compiler:

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
TradeSpaceForPerformance       3.71 ns         3.61 ns    170515835
Virtual                       12.20 ns        12.10 ns     55911685
FunctionPointerList           13.00 ns        12.90 ns     50763964
Index                          7.40 ns         7.38 ns    136228156
GetIf                          4.04 ns         4.02 ns    205214632
HoldsAlternative               3.74 ns         3.73 ns    200278724
ConstexprVisitor              12.50 ns        12.40 ns     56373704
StructVisitor                 12.00 ns        12.00 ns     60866510
Overload                      13.20 ns        13.20 ns     56128558

Results for clang compiler (which I'm surprised by it):

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
TradeSpaceForPerformance       8.07 ns         7.99 ns     77530258
Virtual                        7.80 ns         7.77 ns     77301370
FunctionPointerList            12.1 ns         12.1 ns     56363372
Index                          11.1 ns         11.1 ns     69582297
GetIf                          10.4 ns         10.4 ns     80923874
HoldsAlternative               9.98 ns         9.96 ns     71313572
ConstexprVisitor               11.4 ns         11.3 ns     63267967
StructVisitor                  10.8 ns         10.7 ns     65477522
Overload                       11.4 ns         11.4 ns     64880956


Best benchmark so far (will be updated): http://quick-bench.com/LhdP-9y6CqwGxB-WtDlbG27o_5Y (also check out the GCC)

解决方案

std::visit seems to lack some optimizations yet on some implementations. That being said there's a central point thats not very well seen in this lab-like setup - which is that based design is stack based vs. the virtual pattern which will naturally gravitate towards being heap based. In a real world scenario this means the memory layout could very well be fragmented (perhaps over time - once objects leave the cache, etc.) - unless it can somehow be avoided. The opposite is the based design that can be layout in contigoues memory. I believe this is an extremely important point to consider when performance is concerned that cannot be underestimated.

To illustrate this, consider the following:

std::vector<Base*> runtime_poly_;//risk of fragmentation

vs.

std::vector<my_var_type> cp_time_poly_;//no fragmentation (but padding 'risk')

This fragmentation is somewhat difficult to built into a benchmark test like this one. If this is (also) within the context of bjarne's statement is not clear to me when he said it could potentially be faster (which I do believe holds true).

Another very important thing to remember for the std::variant based design is that the size of each element uses up the size of the largest possible element. Therefore if objects do not have roughly the same size this has to be considered carefully since it may have a bad impact on the cache as a result.

Considering these points together it's hard to say which is best to use in the general case - however it should be clear enough if the set is a closed 'smallish' one of roughtly the same size - then the variant style shows great potential for being faster (as bjarne notes).

We now only considered performance and there are there are indeed other reasons for choosing one or the other pattern: In the end, you just have to get out the comfort of the 'lab' and design and benchmark your real world use cases.

这篇关于`std :: variant` vs.继承vs.其他方式(性能)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆