如何提高增压interval_map查找的性能, [英] How to improve performance of boost interval_map lookups

查看:121
本文介绍了如何提高增压interval_map查找的性能,的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是的boost :: ICL :: interval_map 来映射字节范围为一组字符串。该地图是从(排序)磁盘文件加载,然后我用下面的code查找。

问题是查找是很慢的。

在我的测试中,我插入66425范围到地图中。我异形code,基本上>的99.9%的时间是在不同的升压功能的花费(有没有说是缓慢的特定功能,还有超多的功能很多时间s $ P $垫)。

可以做些什么,使这个速度更快?

是我的树不平衡(我怎么知道的?我怎么能平衡呢?)

时使用一套<串GT;有问题吗?

时计算所述地图和所述窗口的一个问题的交叉点? (虽然这是我需要什么,所以我不能看怎么回事做到这一点)。

 使用命名空间std;
typedef的设置<串GT;物体;
TYPEDEF提振:: ICL :: interval_map< uint64_t中,对象>范围;
空虚
find_range(常量范围*地图,uint64_t中开始,uint64_t中结束,
            无效(* F)(uint64_t中开始,uint64_t中结束,为const char *的对象,
                       无效*不透明),
            无效*不透明)
{
  提高:: ICL ::区间< uint64_t中> ::式窗口;
  窗口=提振:: ICL ::区间< uint64_t中> :: right_open(开始,结束);  范围R = *地图&安培;窗口;  范围:迭代ITER = r.begin();
  而(ITER!= r.end()){
    提高:: ICL ::区间< uint64_t中> ::类型范围= iter->首先,
    uint64_t中开始= range.lower();
    uint64_t中结束= range.upper();    对象obj_set = iter->第二个;
    对象:迭代iter2 = obj_set.begin();
    而(iter2!= obj_set.end()){
      F(开始,结束,iter2-> c_str(),不透明的);
      iter2 ++;
    }
    ITER ++;
  }
}

最初的几个配置文件条目:

 %的累积自我自我总
 时间秒秒召唤我们/来电垂询/调用名称
  9.77 0.13 0.13 0.01 21866814的boost :: ICL :: interval_bounds :: interval_bounds(无符号字符)
  6.02 0.21 0.08 0.01 9132134的boost :: ICL :: interval_traits<提高:: ICL :: discrete_interval<无符号长,性病::以下> > ::降低(提高:: ICL :: discrete_interval<无符号长,性病::以下>&const的放大器;)
  6.02 0.29 0.08 0.01 6004967的boost :: enable_if<提高:: ICL :: is_discrete_interval<提高:: ICL :: discrete_interval<无符号长,性病::以下> >中布尔> ::类型boost :: ICL :: is_empty<提高:: ICL :: discrete_interval<无符号长,性病::以下> >(升压:: ICL :: discrete_interval<无符号长,性病::以下>&const的放大器;)
  5.26 0.36 0.07 0.00 21210093的boost :: ICL :: discrete_interval<无符号长,性病:: GT少&; ::界限()const的
  5.26 0.43 0.07 0.01 11964109的std ::以下<无符号长> ::运算符()(无符号常量长和放大器;,无符号长常量和放大器;)常量
  4.51 0.49 0.06 0.00 35761849的std :: _ Rb_tree<的std :: basic_string的<焦炭,的std :: char_traits<焦炭>中的std ::分配器<焦炭> >中的std :: basic_string的<焦炭,的std :: char_traits<焦炭>中的std ::分配器<&烧焦GT; >中的std :: _身份<的std :: basic_string的<焦炭,的std :: char_traits<焦炭>中的std ::分配器<&烧焦GT; > >中的std ::以下<的std :: basic_string的<焦炭,的std :: char_traits<焦炭>中的std ::分配器<&烧焦GT; > >中的std ::分配器<的std :: basic_string的<焦炭,的std :: char_traits<焦炭>中的std ::分配器<焦炭> > > > :: _ S_left(的std :: _ Rb_tree_node_base常量*)
  4.51 0.55 0.06 0.00 12009934的boost :: ICL :: ==操作符(::提高:: ICL interval_bounds,提振:: ICL :: interval_bounds)
  3.76 0.60 0.05 0.00 12078493的boost :: ICL :: discrete_interval<无符号长,性病:: GT少&; ::上()const的
  3.76 0.65 0.05 0.00 12077959的boost :: enable_if<提高:: ICL :: is_interval<提高:: ICL :: discrete_interval<无符号长,性病::以下> >中的boost :: ICL :: interval_traits<提高:: ICL :: discrete_interval<无符号长,性病::以下> > :: domain_type> ::类型boost :: ICL ::上<提高:: ICL :: discrete_interval<无符号长,性病::以下> >(升压:: ICL :: discrete_interval<无符号长,性病::以下>&const的放大器;)
  3.76 0.70 0.05 0.01 8837475的boost :: ICL :: interval_bounds ::位()const的
  3.76 0.75 0.05 0.01 6004967的boost :: enable_if<提高:: ICL :: is_interval<提高:: ICL :: discrete_interval<无符号长,性病::以下> >中布尔> ::类型boost :: ICL :: domain_less_equal<提高:: ICL :: discrete_interval<无符号长,性病::以下> >(升压:: ICL :: interval_traits<提高:: ICL :: discrete_interval<无符号长,性病::以下>> :: domain_type常量和放大器;,提振:: ICL :: interval_traits<提高:: ICL :: discrete_interval&LT ;无符号长,性病:: GT少&;> :: domain_type常量和放大器;)
  3.01 0.79 0.04 0.01 5891650的boost :: ICL :: is_right_closed(升压:: ICL :: interval_bounds)

数据集: http://oirase.annexia.org/tmp/bmap.txt结果
全code:
http://git.annexia.org/ p =的virt-bmap.git; A =树


解决方案

在这个答案我present 3优化:


  1. 替换的对象的std ::设置的boost ::容器:: flat_set 为改善地方(和可能的重新分配的成本,因为最物体集合是4;元素)


      

    更新在我下面的最终版本,只需更换的boost ::容器:: flat_map 回来性病::设置的一个因素降解只是 find_range 业绩〜2倍和 find_range_ex 按〜4倍的我的测试系统上的因素



  2. 替换对象ID 的std ::字符串 string_atom (这在技术上是一个字符常量* 但在逻辑上唯一的)。这种技术类似于在各种VM实现实习字符串(如Java / .NET)。


      

    注意:目前实施 make_atom 的是非常简单的,从不释放原子。你可能会想要备份的字符串一个deque,使用升压飞铁,游泳池分配器或它们的某种组合,使其更聪明。然而,这是否需要取决于你的使用案例



  3. 替换了 equal_range 调用,它通过避免拷贝(大量的)数据保存大量的时间在地图路口


      

    _ 更新:当申请的只是的这种优化孤立的加快已经 26〜30倍。请注意,内存使用量在〜20MiB相比,包括所有三个优化时,大约是两倍_



容量和数据效率

我开始之前,我想了解一下数据的模样。所以,写一些code解析该 bmap.txt 输入,我们可以得到:

<大骨节病> 源在Coliru

 解析的确定
的66425输入线直方图
D:3367
F:20613
电话号码:21222
五:21223
范围大小:6442450944
迭代范围大小:21223
闵对象集:1.000000
最大对象集:234.000000
平均对象集:3.129859
闵区间宽度:1024.000000
最大区间宽度:2526265344.000000
平均区间宽度:296.445177k
第一:[0,1048576)
最后:3916185600,6442450944)
字符串原子:23904独一无二的66425总
原子效率:35.986451%

正如你所看到的组通常〜3项,和许多被复制。

从使用的 make_atom 对象命​​名法的boost :: flat_set 减少内存分配=htt​​p://paste.ubuntu.com/9256518/>〜15GiB 〜 10Gib

此优化也降低了字符串比较为集插入指针比较和该组合策略 interval_map ,所以对于更大的数据集这有有很多的潜力加速。

查询效率

查询性能的确是严重的输入部分副本残废。

不要复制,而是查看重叠范围,只需通过更换:

 常量范围R = *地图&安培;窗口;
  范围::为const_iterator ITER = r.begin();
  而(ITER!= r.end()){

 汽车R = MAP-GT&; equal_range(窗口);
  范围::为const_iterator ITER = r.first;
  而(ITER!= r.second){

在运行这两个版本的结果相同的10000查询随机在一个加速我的系统32X

  10000随机OLD查找导致29148ms 156729884回调
10000随机NEW查找导致897ms 156729884回调真正0m31.715s
用户0m31.664s
SYS 0m0.012s

运行时间由现在的解析/统计为主。全基准code是在这里: 在Coliru

 的#define BOOST_RESULT_OF_USE_DECTYPE
#定义BOOST_SPIRIT_USE_PHOENIX_V3/ *的virt-BMAP考官插件
 *版权所有(C)2014年红帽公司
 *
 *本程序是自由软件;您可以重新分配和/或修改
 *它在GNU通用公共许可证的条款被公布
 *自由软件基金会;或者用许可证的第二版,或
 *(由你选择)任何更新的版本。
 *
 *这个程序是分布在希望这将是有益的,
 *但没有任何担保;甚至没有的隐含担保
 *适销性或适用于特定用途。查看
 * GNU通用公共许可证的更多细节。
 *
 *您应该已经收到了GNU通用公共许可证的副本
 *程序一起;如果没有,请写信给自由软件
 *基金公司,51街富兰克林五楼,波士顿,MA 0​​2110-1301 USA。
 * /#包括LT&;&stdio.h中GT;
#包括LT&;&stdlib.h中GT;
#包括LT&;&stdint.h GT;
#包括LT&;&inttypes.h GT;
#包括LT&;&ASSERT.H GT;#包括LT&;提升/ ICL / interval.hpp&GT;
#包括LT&;提升/ ICL / interval_set.hpp&GT;
#包括LT&;提升/ ICL / interval_map.hpp&GT;
#包括LT&;升压/集装箱/ flat_set.hpp&GT;使用命名空间std;/ *地图区间(uint64_t中,uint64_t中),以一组字符串,其中每个
 *字符串重新presents覆盖这个范围内的对象。
 * /静态为size_t atoms_requested = 0;
静态为size_t atoms_unique_created = 0;使用string_atom =为const char *;
string_atom make_atom(标准::字符串&放大器;&安培; S)
{
    静态的std ::设为&LT;标准::字符串&GT; s_atoms;
    atoms_requested + = 1;    汽车IT = s_atoms.find(S);
    如果(它!= s_atoms.end())
        返回IT-&GT; c_str();    atoms_unique_created + = 1;
    返回s_atoms.insert(性病::移动(S))第一代方式&gt; c_str();
}TYPEDEF提振::容器:: flat_set&LT; string_atom&GT;物体;
TYPEDEF提振:: ICL :: interval_map&LT; uint64_t中,对象&GT;范围;*范围
new_ranges(无效)
{
  返回新的范围();
}空虚
free_ranges(范围* mapv)
{
  量程*地图=(范围*)mapv;
  删除地图;
}为externC无效
insert_range(无效* mapv,uint64_t中开始,uint64_t中结束,为const char *的对象)
{
  量程*地图=(范围*)mapv;
  对象obj_set;
  obj_set.insert(obj_set.end(),对象);
  MAP-&gt;添加(的std :: make_pair(升压:: ICL ::区间&LT; uint64_t中&GT; :: right_open(开始,结束),// SEHE添加的std ::
                       obj_set));
}为externC无效
iter_range(无效* mapv,无效(* F)(uint64_t中开始,uint64_t中结束,为const char *的对象,无效*不透明),无效*不透明)
{
  量程*地图=(范围*)mapv;
  范围:迭代ITER = MAP-GT&;开始();
  而(ITER = MAP-GT&;!结束()){
    提高:: ICL ::区间&LT; uint64_t中&GT; ::类型范围= iter-&gt;首先,
    uint64_t中开始= range.lower();
    uint64_t中结束= range.upper();    对象obj_set = iter-&gt;第二个;
    对象:迭代iter2 = obj_set.begin();
    而(iter2!= obj_set.end()){
      F(开始,结束,* iter2 / * - &GT; c_str()* /,不透明的); // SEHE
      iter2 ++;
    }
    ITER ++;
  }
}为externC无效
find_range(无效常量* mapv,uint64_t中开始,uint64_t中结束,无效(* F)(uint64_t中开始,uint64_t中结束,为const char *的对象,无效*不透明),无效*不透明)
{
  常量范围*地图=(const的范围*)mapv;  提高:: ICL ::区间&LT; uint64_t中&GT; ::式窗口;
  窗口=提振:: ICL ::区间&LT; uint64_t中&GT; :: right_open(开始,结束);  常量范围R = *地图&安培;窗口;  范围::为const_iterator ITER = r.begin();
  而(ITER!= r.end()){
    提高:: ICL ::区间&LT; uint64_t中&GT; ::类型范围= iter-&gt;首先,
    uint64_t中开始= range.lower();
    uint64_t中结束= range.upper();    对象obj_set = iter-&gt;第二个;
    对象:迭代iter2 = obj_set.begin();
    而(iter2!= obj_set.end()){
      F(开始,结束,* iter2 / * - &GT; c_str()* /,不透明的); // SEHE
      iter2 ++;
    }
    ITER ++;
  }
}为externC无效
find_range_ex(无效常量* mapv,uint64_t中开始,uint64_t中结束,无效(* F)(uint64_t中开始,uint64_t中结束,为const char *的对象,无效*不透明),无效*不透明)
{
  常量范围*地图=(const的范围*)mapv;  提高:: ICL ::区间&LT; uint64_t中&GT; ::式窗口;
  窗口=提振:: ICL ::区间&LT; uint64_t中&GT; :: right_open(开始,结束);#如果0
  常量范围R = *地图&安培;窗口;
  范围::为const_iterator ITER = r.begin();
  而(ITER!= r.end()){
#其他
  汽车R = MAP-GT&; equal_range(窗口);
  范围::为const_iterator ITER = r.first;
  而(ITER!= r.second){
#万一    提高:: ICL ::区间&LT; uint64_t中&GT; ::类型范围= iter-&gt;首先,
    uint64_t中开始= range.lower();
    uint64_t中结束= range.upper();    对象obj_set = iter-&gt;第二个;
    对象:迭代iter2 = obj_set.begin();
    而(iter2!= obj_set.end()){
      F(开始,结束,* iter2 / * - &GT; c_str()* /,不透明的); // SEHE
      iter2 ++;
    }
    ITER ++;
  }
}#包括LT&;内存和GT;
#包括LT&;升压/精神/有/ qi.hpp&GT;
#包括LT&;升压/精神/有/ phoenix.hpp&GT;
#包括LT&;升压/累加器/ accumulators.hpp&GT;
#包括LT&;升压/累加器/ statistics.hpp&GT;
#包括LT&;&的fstream GT;
#包括LT&;&计时GT;性病::地图&LT;焦炭,为size_t&GT;组织相容;布尔insert_line_of_input(范围和放大器; bmap_data,uint64_t中B,uint64_t中即char类型,标准::字符串&放大器;对象){
    如果(!object.empty())
        组织相容【类型】++;
    //性病::法院LT&;&LT;性病::十六进制&LT;&LT; B&LT;&LT; &LT;&LT; E&LT;&LT; &LT;&LT;类型&lt;&LT; &LT;&LT;对象&LT;&LT; \\ n;#如果0
    object.insert(object.begin(),':');
    object.insert(object.begin(),类型);
#万一
    insert_range(安培; bmap_data,B,E,make_atom(性病::移动(对象)));
    返回true;
}的std ::矢量&lt;的std ::对&LT; uint64_t中,uint64_t中&GT; &GT; generate_test_queries(范围常量和放大器; bmap_data,为size_t N){
    的std ::矢量&lt;的std ::对&LT; uint64_t中,uint64_t中&GT; &GT;查询;
    queries.reserve(N);    用于(为size_t我= 0; I&LT; N ++ I)
    {
        自动启动=(的static_cast&LT; uint64_t中&GT;(RAND())* RAND())%bmap_data.size();
        自动结束=启动+兰特();        queries.emplace_back(开始,结束);
    }    返回查询;
}范围read_mapfile(为const char * FNAME){
    性病:: ifstream的IFS(FNAME);
    提振精神:: :: istream_iterator F(IFS&GT;&GT;的std :: noskipws),L;    范围bmap_data;    命名空间PHX =提振::凤;
    使用空间boost ::精神::补气;
    uint_parser&LT; uint64_t中,16取代;抵消;
    如果(!phrase_parse(F,L,
                (1&GT;&GT;抵消&GT;&GT;抵消&GT;&GT;烧焦_(PVDF)&GT;&GT; as_string [语义[+图]&GT;&GT; attr指示(/)GT;&GT;语义[*〜字符_(\\ r \\ n)])
                [_pass = PHX ::绑定(insert_line_of_input,PHX :: REF(bmap_data),_1,_2,_3,_4)%EOL&GT;&GT; * EOL,
                空白))
    {
        出口(255);
    }其他
    {
        性病::法院LT&;&LT; 经分析确定\\ N的;
    }    如果(F!= 1)
        性病::法院LT&;&LT; 警告:剩余的输入'&LT;&LT;标准::字符串(F,L)LT;&LT; \\ n;    返回bmap_data;
}无效report_statistics(范围常量和放大器; bmap_data){
    为size_t总= 0;
    为(自动E:组织相容)总+ = e.second;    性病::法院LT&;&LT; &LT直方图;&LT;总&LT;&LT; 输入行的\\ n;    为(自动E:组织相容)
        性病::法院LT&;&LT; e.first&LT;&LT; :&所述;&下; e.second&LT;&LT; \\ n;    命名空间BA =的boost ::蓄电池;
    巴:: accumulator_set&LT;双,巴::统计&LT; BA ::标签::平均值,BA ::标签::最大,BA ::标签::分钟&GT; &GT;
        object_sets,interval_widths;    为(自动常量和放大器; R:bmap_data)
    {
        汽车宽度= r.first.upper() - r.first.lower();
        断言(宽%1024 == 0);        interval_widths(宽);
        object_sets(r.second.size());
    }    性病::法院LT&;&LT;的std ::固定;
    性病::法院LT&;&LT; 范围大小:&LT;&LT; bmap_data.size()&所述;&下; \\ n;
    性病::法院LT&;&LT; 迭代的范围大小:&LT;&LT; bmap_data.iterative_size()&所述;&下; \\ n;    性病::法院LT&;&LT; 最小对象集:&LT;&LT;巴::分(object_sets)LT;&LT; \\ n;
    性病::法院LT&;&LT; 最大对象集:&LT;&LT;巴:: MAX(object_sets)LT;&LT; \\ n;
    性病::法院LT&;&LT; 平均对象集:&LT;&LT;巴::平均值(object_sets)LT;&LT; \\ n;
    性病::法院LT&;&LT; 闵区间宽度:&LT;&LT;巴::分(interval_widths)LT;&LT; \\ n;
    性病::法院LT&;&LT; 最大区间宽度:&LT;&LT;巴:: MAX(interval_widths)LT;&LT; \\ n;
    性病::法院LT&;&LT; 平均区间宽度:&LT;&LT;巴::平均值(interval_widths)/1024.0&LT;&LT; K \\ N的;
    性病::法院LT&;&LT; 第一:&LT;&LT; bmap_data.begin() - &gt;首先&LT;&LT; \\ n;
    性病::法院LT&;&LT; 最后的:&LT;&LT; bmap_data.rbegin() - &gt;首先&LT;&LT; \\ n;    性病::法院LT&;&LT; 字符串原子:&LT;&LT; atoms_unique_created&LT;&LT; &LT中独特的;&LT; atoms_requested&LT;&LT; 总\\ n;
    性病::法院LT&;&LT; 凌动效率:&LT;&LT; (atoms_unique_created * 100.0 / atoms_requested)LT;&LT; %\\ N的;
}无效perform_comparative_benchmarks(范围常量和放大器; bmap_data,为size_t number_of_queries){
    函数srand(42);
    汽车常量查询= generate_test_queries(bmap_data,number_of_queries);    使用HRC =的std ::时辰:: high_resolution_clock;
    {
        自动启动= HRC ::现在();
        为size_t回调= 0;        为(自动常量和放大器;问:查询)
        {
            find_range(安培; bmap_data,q.first,q.second,
                    [](uint64_t中开始,uint64_t中结束,为const char *的对象,无效*不透明){
                    ++(*的static_cast&LT;为size_t *&GT;(不透明));
                    }&安培;回调);
        }
        性病::法院LT&;&LT; number_of_queries&LT;&LT; '随机'老查找导致&LT;&LT;回调
                  &LT;&LT; &LT,在回调;&LT;的std ::时辰:: duration_cast&LT;的std ::时辰::毫秒&GT;((HRC ::现在() - 启动))计算()LT;&LT; MS \\ N的;
    }    {
        自动启动= HRC ::现在();
        为size_t回调= 0;        为(自动常量和放大器;问:查询)
        {
            find_range_ex(安培; bmap_data,q.first,q.second,
                    [](uint64_t中开始,uint64_t中结束,为const char *的对象,无效*不透明){
                    ++(*的static_cast&LT;为size_t *&GT;(不透明));
                    }&安培;回调);
        }
        性病::法院LT&;&LT; number_of_queries&LT;&LT; 随机NEW查找导致&LT;&LT;回调
                  &LT;&LT; &LT,在回调;&LT;的std ::时辰:: duration_cast&LT;的std ::时辰::毫秒&GT;((HRC ::现在() - 启动))计算()LT;&LT; MS \\ N的;
    }
}诠释主(){
    自动BMAP = read_mapfile(bmap.txt);    report_statistics(BMAP);    perform_comparative_benchmarks(BMAP,1000);#如果0 //倾倒范围安慰
    为(自动常量和放大器; R:BMAP)
    {
        性病::法院LT&;&LT; r.first&LT;&LT; \\ t的&LT;&LT; r.second.size()&所述;&下; \\ t的;
        性病::复制(r.second.begin(),r.second.end()的std :: ostream_iterator&LT;标准::字符串&GT;(STD ::法院,\\ t的));
        性病::法院LT&;&LT; \\ n;
    }
#万一
}

I'm using a boost::icl::interval_map to map byte ranges to a set of strings. The map is loaded from a (sorted) disk file, and then I do lookups using the code below.

The problem is the lookups are really slow.

In my test, I inserted 66425 ranges into the map. I profiled the code and basically > 99.9% of the time is spent in various Boost functions (there's not a particular function that is slow, there's a lot of time spread over many functions).

What can be done to make this faster?

Is my tree unbalanced (how do I find out? how can I rebalance it?)

Is using set<string> a problem?

Is calculating the intersection of the map and the window a problem? (Although it's what I need, so I can't see how else to do it).

using namespace std;
typedef set<string> objects;
typedef boost::icl::interval_map<uint64_t, objects> ranges;
void
find_range (const ranges *map, uint64_t start, uint64_t end,
            void (*f) (uint64_t start, uint64_t end, const char *object,
                       void *opaque),
            void *opaque)
{
  boost::icl::interval<uint64_t>::type window;
  window = boost::icl::interval<uint64_t>::right_open (start, end);

  ranges r = *map & window;

  ranges::iterator iter = r.begin ();
  while (iter != r.end ()) {
    boost::icl::interval<uint64_t>::type range = iter->first;
    uint64_t start = range.lower ();
    uint64_t end = range.upper ();

    objects obj_set = iter->second;
    objects::iterator iter2 = obj_set.begin ();
    while (iter2 != obj_set.end ()) {
      f (start, end, iter2->c_str (), opaque);
      iter2++;
    }
    iter++;
  }
}

The first few profile entries:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
  9.77      0.13     0.13 21866814     0.01           boost::icl::interval_bounds::interval_bounds(unsigned char)
  6.02      0.21     0.08  9132134     0.01           boost::icl::interval_traits<boost::icl::discrete_interval<unsigned long, std::less> >::lower(boost::icl::discrete_interval<unsigned long, std::less> const&)
  6.02      0.29     0.08  6004967     0.01           boost::enable_if<boost::icl::is_discrete_interval<boost::icl::discrete_interval<unsigned long, std::less> >, bool>::type boost::icl::is_empty<boost::icl::discrete_interval<unsigned long, std::less> >(boost::icl::discrete_interval<unsigned long, std::less> const&)
  5.26      0.36     0.07 21210093     0.00           boost::icl::discrete_interval<unsigned long, std::less>::bounds() const
  5.26      0.43     0.07 11964109     0.01           std::less<unsigned long>::operator()(unsigned long const&, unsigned long const&) const
  4.51      0.49     0.06 35761849     0.00           std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::_Identity<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_S_left(std::_Rb_tree_node_base const*)
  4.51      0.55     0.06 12009934     0.00           boost::icl::operator==(boost::icl::interval_bounds, boost::icl::interval_bounds)
  3.76      0.60     0.05 12078493     0.00           boost::icl::discrete_interval<unsigned long, std::less>::upper() const
  3.76      0.65     0.05 12077959     0.00           boost::enable_if<boost::icl::is_interval<boost::icl::discrete_interval<unsigned long, std::less> >, boost::icl::interval_traits<boost::icl::discrete_interval<unsigned long, std::less> >::domain_type>::type boost::icl::upper<boost::icl::discrete_interval<unsigned long, std::less> >(boost::icl::discrete_interval<unsigned long, std::less> const&)
  3.76      0.70     0.05  8837475     0.01           boost::icl::interval_bounds::bits() const
  3.76      0.75     0.05  6004967     0.01           boost::enable_if<boost::icl::is_interval<boost::icl::discrete_interval<unsigned long, std::less> >, bool>::type boost::icl::domain_less_equal<boost::icl::discrete_interval<unsigned long, std::less> >(boost::icl::interval_traits<boost::icl::discrete_interval<unsigned long, std::less> >::domain_type const&, boost::icl::interval_traits<boost::icl::discrete_interval<unsigned long, std::less> >::domain_type const&)
  3.01      0.79     0.04  5891650     0.01           boost::icl::is_right_closed(boost::icl::interval_bounds)

Data set: http://oirase.annexia.org/tmp/bmap.txt
Full code: http://git.annexia.org/?p=virt-bmap.git;a=tree

解决方案

In this answer I present three optimizations:

  1. replacing the objects std::set by boost::container::flat_set for improved locality (and likely reallocation costs, since most object sets are <4 elements)

    UPDATE In my final version below, simply replacing boost::container::flat_map back with std::set degraded performance of just find_range by a factor ~2x and find_range_ex by a factor of ~4x on my test system

  2. replacing the object id std::string by string_atom (which is technically a char const* but logically unique). This technique is similar to interned strings in various VM implementations (like Java/.NET).

    NOTE: The current implementation of make_atom is exceedingly simplistic and never frees atoms. You would potentially want to back the strings in a deque, use Boost Flyweights, a pool allocator or some combination of those to make it smarter. However, whether this is required depends on your use cases

  3. replacing the map intersection with a equal_range call, which saves the bulk of time by avoiding copying (large amounts of) data

    _UPDATE When applying just this optimization in isolation the speed up is already 26~30x. Note that the memory usage is roughly double at ~20MiB compared to when including all three optimizations_

Volume and data efficiency

Before I start, I like to know what the data looks like. So, writing some code to parse that bmap.txt input, we get:

Source On Coliru

Parsed ok
Histogram of 66425 input lines
d: 3367
f: 20613
p: 21222
v: 21223
ranges size:            6442450944
ranges iterative size:  21223
Min object set:         1.000000
Max object set:         234.000000
Average object set:     3.129859
Min interval width:     1024.000000
Max interval width:     2526265344.000000
Average interval width: 296.445177k
First:                  [0,1048576)
Last:                   [3916185600,6442450944)
String atoms:           23904 unique in 66425 total
Atom efficiency:        35.986451%

As you can see the sets are usually ~3 items, and many are duplicated.

Using the make_atom object naming method with boost::flat_set reduced memory allocation from ~15GiB to ~10Gib.

This optimization also reduces string comparison to pointer comparison for set insertion and the Combiner strategy of the interval_map, so for larger data sets this has the potential to have a lot of speedup.

Query efficiency

Query performance is indeed severely crippled by the partial copy of the input.

Don't copy, instead view the overlapping range, simply by replacing:

  const ranges r = *map & window;
  ranges::const_iterator iter = r.begin ();
  while (iter != r.end ()) {

with

  auto r = map->equal_range(window);
  ranges::const_iterator iter = r.first;
  while (iter != r.second) {

On my system running 10000 identical randomized queries with both versions results in a speedup of 32x:

10000 'random' OLD lookups resulted in 156729884 callbacks in 29148ms
10000 'random' NEW lookups resulted in 156729884 callbacks in 897ms

real    0m31.715s
user    0m31.664s
sys 0m0.012s

The runtime is now dominated by the parsing/statistics. Full benchmark code is here: On Coliru

#define BOOST_RESULT_OF_USE_DECTYPE
#define BOOST_SPIRIT_USE_PHOENIX_V3

/* virt-bmap examiner plugin
 * Copyright (C) 2014 Red Hat Inc.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 */

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
#include <assert.h>

#include <boost/icl/interval.hpp>
#include <boost/icl/interval_set.hpp>
#include <boost/icl/interval_map.hpp>
#include <boost/container/flat_set.hpp>

using namespace std;

/* Maps intervals (uint64_t, uint64_t) to a set of strings, where each
 * string represents an object that covers that range.
 */

static size_t atoms_requested = 0;
static size_t atoms_unique_created = 0;

using string_atom = const char*;
string_atom make_atom(std::string&& s)
{
    static std::set<std::string> s_atoms;
    atoms_requested += 1;

    auto it = s_atoms.find(s);
    if (it != s_atoms.end())
        return it->c_str();

    atoms_unique_created += 1;
    return s_atoms.insert(std::move(s)).first->c_str();
}

typedef boost::container::flat_set<string_atom> objects;
typedef boost::icl::interval_map<uint64_t, objects> ranges;

ranges*
new_ranges (void)
{
  return new ranges ();
}

void
free_ranges (ranges *mapv)
{
  ranges *map = (ranges *) mapv;
  delete map;
}

extern "C" void
insert_range (void *mapv, uint64_t start, uint64_t end, const char *object)
{
  ranges *map = (ranges *) mapv;
  objects obj_set;
  obj_set.insert (obj_set.end(), object);
  map->add (std::make_pair (boost::icl::interval<uint64_t>::right_open (start, end), // SEHE added std::
                       obj_set));
}

extern "C" void
iter_range (void *mapv, void (*f) (uint64_t start, uint64_t end, const char *object, void *opaque), void *opaque)
{
  ranges *map = (ranges *) mapv;
  ranges::iterator iter = map->begin ();
  while (iter != map->end ()) {
    boost::icl::interval<uint64_t>::type range = iter->first;
    uint64_t start = range.lower ();
    uint64_t end = range.upper ();

    objects obj_set = iter->second;
    objects::iterator iter2 = obj_set.begin ();
    while (iter2 != obj_set.end ()) {
      f (start, end, *iter2/*->c_str ()*/, opaque); // SEHE
      iter2++;
    }
    iter++;
  }
}

extern "C" void
find_range (void const *mapv, uint64_t start, uint64_t end, void (*f) (uint64_t start, uint64_t end, const char *object, void *opaque), void *opaque)
{
  const ranges *map = (const ranges *) mapv;

  boost::icl::interval<uint64_t>::type window;
  window = boost::icl::interval<uint64_t>::right_open (start, end);

  const ranges r = *map & window;

  ranges::const_iterator iter = r.begin ();
  while (iter != r.end ()) {
    boost::icl::interval<uint64_t>::type range = iter->first;
    uint64_t start = range.lower ();
    uint64_t end = range.upper ();

    objects obj_set = iter->second;
    objects::iterator iter2 = obj_set.begin ();
    while (iter2 != obj_set.end ()) {
      f (start, end, *iter2/*->c_str ()*/, opaque); // SEHE
      iter2++;
    }
    iter++;
  }
}

extern "C" void
find_range_ex (void const *mapv, uint64_t start, uint64_t end, void (*f) (uint64_t start, uint64_t end, const char *object, void *opaque), void *opaque)
{
  const ranges *map = (const ranges *) mapv;

  boost::icl::interval<uint64_t>::type window;
  window = boost::icl::interval<uint64_t>::right_open (start, end);

#if 0
  const ranges r = *map & window;
  ranges::const_iterator iter = r.begin ();
  while (iter != r.end ()) {
#else
  auto r = map->equal_range(window);
  ranges::const_iterator iter = r.first;
  while (iter != r.second) {
#endif

    boost::icl::interval<uint64_t>::type range = iter->first;
    uint64_t start = range.lower ();
    uint64_t end = range.upper ();

    objects obj_set = iter->second;
    objects::iterator iter2 = obj_set.begin ();
    while (iter2 != obj_set.end ()) {
      f (start, end, *iter2/*->c_str ()*/, opaque); // SEHE
      iter2++;
    }
    iter++;
  }
}

#include <memory>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>
#include <fstream>
#include <chrono>

std::map<char, size_t> histo;

bool insert_line_of_input(ranges& bmap_data, uint64_t b, uint64_t e, char type, std::string& object) {
    if (!object.empty())
        histo[type]++;
    //std::cout << std::hex << b << " " << e << " " << type << " " << object << "\n";

#if 0
    object.insert(object.begin(), ':');
    object.insert(object.begin(), type);
#endif
    insert_range(&bmap_data, b, e, make_atom(std::move(object)));
    return true;
}

std::vector<std::pair<uint64_t, uint64_t> > generate_test_queries(ranges const& bmap_data, size_t n) {
    std::vector<std::pair<uint64_t, uint64_t> > queries;
    queries.reserve(n);

    for (size_t i = 0; i < n; ++i)
    {
        auto start = (static_cast<uint64_t>(rand()) * rand()) % bmap_data.size();
        auto end   = start + rand();

        queries.emplace_back(start,end);
    }

    return queries;
}

ranges read_mapfile(const char* fname) {
    std::ifstream ifs(fname);
    boost::spirit::istream_iterator f(ifs >> std::noskipws), l;

    ranges bmap_data;

    namespace phx = boost::phoenix;
    using namespace boost::spirit::qi;
    uint_parser<uint64_t, 16> offset;
    if (!phrase_parse(f,l,
                ("1 " >> offset >> offset >> char_("pvdf") >> as_string[lexeme[+graph] >> attr('/') >> lexeme[*~char_("\r\n")]]) 
                [ _pass = phx::bind(insert_line_of_input, phx::ref(bmap_data), _1, _2, _3, _4) ] % eol >> *eol, 
                blank))
    {
        exit(255);
    } else
    {
        std::cout << "Parsed ok\n";
    }

    if (f!=l)
        std::cout << "Warning: remaining input '" << std::string(f,l) << "\n";

    return bmap_data;
}

void report_statistics(ranges const& bmap_data) {
    size_t total = 0;
    for (auto e : histo) total += e.second;

    std::cout << "Histogram of " << total << " input lines\n";

    for (auto e : histo)
        std::cout << e.first << ": " << e.second << "\n";

    namespace ba = boost::accumulators;
    ba::accumulator_set<double, ba::stats<ba::tag::mean, ba::tag::max, ba::tag::min> > 
        object_sets, interval_widths;

    for (auto const& r : bmap_data)
    {
        auto width = r.first.upper() - r.first.lower();
        assert(width % 1024 == 0);

        interval_widths(width);
        object_sets(r.second.size());
    }

    std::cout << std::fixed;
    std::cout << "ranges size:            " << bmap_data.size()                 << "\n";
    std::cout << "ranges iterative size:  " << bmap_data.iterative_size()       << "\n";

    std::cout << "Min object set:         " << ba::min(object_sets)             << "\n" ;
    std::cout << "Max object set:         " << ba::max(object_sets)             << "\n" ;
    std::cout << "Average object set:     " << ba::mean(object_sets)            << "\n" ;
    std::cout << "Min interval width:     " << ba::min(interval_widths)         << "\n" ;
    std::cout << "Max interval width:     " << ba::max(interval_widths)         << "\n" ;
    std::cout << "Average interval width: " << ba::mean(interval_widths)/1024.0 << "k\n" ;
    std::cout << "First:                  " << bmap_data.begin()->first         << "\n" ;
    std::cout << "Last:                   " << bmap_data.rbegin()->first        << "\n" ;

    std::cout << "String atoms:           " << atoms_unique_created << " unique in " << atoms_requested << " total\n";
    std::cout << "Atom efficiency:        " << (atoms_unique_created*100.0/atoms_requested) << "%\n";
}

void perform_comparative_benchmarks(ranges const& bmap_data, size_t number_of_queries) {
    srand(42);
    auto const queries = generate_test_queries(bmap_data, number_of_queries);

    using hrc = std::chrono::high_resolution_clock;
    {
        auto start = hrc::now();
        size_t callbacks = 0;

        for (auto const& q: queries)
        {
            find_range(&bmap_data, q.first, q.second, 
                    [](uint64_t start, uint64_t end, const char *object, void *opaque) {
                    ++(*static_cast<size_t*>(opaque));
                    }, &callbacks);
        }
        std::cout << number_of_queries << " 'random' OLD lookups resulted in " << callbacks 
                  << " callbacks in " << std::chrono::duration_cast<std::chrono::milliseconds>((hrc::now()-start)).count() << "ms\n";
    }

    {
        auto start = hrc::now();
        size_t callbacks = 0;

        for (auto const& q: queries)
        {
            find_range_ex(&bmap_data, q.first, q.second, 
                    [](uint64_t start, uint64_t end, const char *object, void *opaque) {
                    ++(*static_cast<size_t*>(opaque));
                    }, &callbacks);
        }
        std::cout << number_of_queries << " 'random' NEW lookups resulted in " << callbacks 
                  << " callbacks in " << std::chrono::duration_cast<std::chrono::milliseconds>((hrc::now()-start)).count() << "ms\n";
    }
}

int main() {
    auto bmap = read_mapfile("bmap.txt");

    report_statistics(bmap);

    perform_comparative_benchmarks(bmap, 1000);

#if 0 // to dump ranges to console
    for (auto const& r : bmap)
    {
        std::cout << r.first << "\t" << r.second.size() << "\t";
        std::copy(r.second.begin(), r.second.end(), std::ostream_iterator<std::string>(std::cout, "\t"));
        std::cout << "\n";
    }
#endif
}

这篇关于如何提高增压interval_map查找的性能,的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆