提高multi_index_container和慢运算符++ [英] boost multi_index_container and slow operator++

查看:78
本文介绍了提高multi_index_container和慢运算符++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是

  • 有一个避免使用 any_range 的开关(如果您关心性能,则没有意义)
  • 有一个开关可以调节飞行重量:

      #define USE_FLYWEIGHT 0//0:无1:完整2:无跟踪3:无跟踪无锁定 

    再次

    ,它只是表明您可以轻松地做到这一点,并且应该考虑这样做,除非您需要对字符串(?)进行内存优化.如果是这样,请考虑使用 OPTIMIZE_ATOMS 方法:

  • OPTIMIZE_ATOMS 基本上在那里为 wstring 施加重量.由于此处重复了所有字符串,因此存储效率极高(尽管实现起来又快又脏,应该加以改进).这个想法可以在这里更好地应用:如何提高Boost interval_map查找的性能

以下是一些基本时机:

如您所见,对于查询/迭代性能,基本上没有任何实质意义

任何迭代器:它们重要吗?

这可能是编译器的罪魁祸首.在我的编译器(gcc 4.8.2)上,它没什么大的,但是看到了没有任何迭代器的 的累加循环的反汇编:

从我突出显示的部分中可以看到,算法,lambda或迭代器遍历似乎都没有太多麻烦.现在使用 any_iterator的情况还不太清楚,并且如果您的编译优化不够好,我可以想象它无法内联基本操作,从而使迭代速度变慢.(现在只是猜测一下)

It is follow-up question for this MIC question. When adding items to the vector of reference wrappers I spend about 80% of time inside ++ operator whatever iterating approach I choose.
The query works as following

VersionView getVersionData(int subdeliveryGroupId, int retargetingId,
                             const std::wstring &flightName) const {
    VersionView versions;
    for (auto i = 0; i < 3; ++i) {
      for (auto j = 0; j < 3; ++j) {
        versions.insert(m_data.get<mvKey>().equal_range(boost::make_tuple(subdeliveryGroupId + i, retargetingId + j,
                                 flightName)));
      }
    }
    return versions;
  }

I've tried following ways to fill the reference wrapper

template <typename InputRange> void insert(const InputRange &rng) {
    // 1)   base::insert(end(), rng.first, rng.second); // 12ms
    // 2)   std::copy(rng.first, rng.second, std::back_inserter(*this)); // 6ms
    /* 3)   size_t start = size();  // 12ms
                    auto tmp = std::reference_wrapper<const
       VersionData>(VersionData(0,0,L""));
                    resize(start + boost::size(rng), tmp);
                    auto beg = rng.first;
                    for (;beg != rng.second; ++beg, ++start)
                    {
                         this->operator[](start) = std::reference_wrapper<const VersionData>(*beg);
                    }
    */
    std::copy(rng.first, rng.second, std::back_inserter(*this));
  }

Whatever I do I pay for operator ++ or the size method which just increments the iterator - meaning I'm still stuck in ++. So the question is if there is a way to iterate result ranges faster. If there is no such a way is it worth to try and go down the implementation of equal_range adding new argument which holds reference to the container of reference_wrapper which will be filled with results instead of creating range?

EDIT 1: sample code http://coliru.stacked-crooked.com/a/8b82857d302e4a06/
Due to this bug it will not compile on Coliru
EDIT 2: Call tree, with time spent in operator ++
EDIT 3: Some concrete stuff. First of all I didnt started this thread just because the operator++ takes too much time in overall execution time and I dont like it just "because" but at this very moment it is the major bottleneck in our performance tests. Each request usually processed in hundreds of microseconds, request similar to this one (they are somewhat more complex) are processed ~1000-1500 micro and it is still acceptable. The original problem was that once the number of items in datastructure grows to hundreds of thousands the performance deteriorates to something like 20 milliseconds. Now after switching to MIC (which drastically improved the code readability, maintainability and overall elegance) I can reach something like 13 milliseconds per request of which 80%-90% spent in operator++. Now the question if this could be improved somehow or should I look for some tar and feathers for me? :)

解决方案

I think that you're measuring the wrong things entirely. When I scale up from 3x3x11111 to 10x10x111111 (so 111x as many items in the index), it still runs in 290ms.

And populating the stuff takes orders of magnitude more time. Even deallocating the container appears to take more time.

What Doesn't Matter?

I've contributed a version with some trade offs, which mainly show that there's no sense in tweaking things: View On Coliru

  • there's a switch to avoid the any_range (it doesn't make sense using that if you care for performance)
  • there's a switch to tweak the flyweight:

    #define USE_FLYWEIGHT 0 // 0: none 1: full 2: no tracking 3: no tracking no locking
    

    again, it merely shows you could easily do without, and should consider doing so unless you need the memory optimization for the string (?). If so, consider using the OPTIMIZE_ATOMS approach:

  • the OPTIMIZE_ATOMS basically does fly weight for wstring there. Since all the strings are repeated here it will be mighty storage efficient (although the implementation is quick and dirty and should be improved). The idea is much better applied here: How to improve performance of boost interval_map lookups

Here's some rudimentary timings:

As you can see, basically nothing actually matters for query/iteration performance

Any Iterators: Doe They Matter?

It might be the culprit on your compiler. On my compile (gcc 4.8.2) it wasn't anything big, but see the disassembly of the accumulate loop without the any iterator:

As you can see from the sections I've highlighted, there doesn't seem to be much fat from the algorithm, the lambda nor from the iterator traversal. Now with the any_iterator the situation is much less clear, and if your compile optimizes less well, I can imagine it failing to inline elementary operations making iteration slow. (Just guessing a little now)

这篇关于提高multi_index_container和慢运算符++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆