如何在key类型std :: string的地图查找中提高性能? [英] How can I increase the performance in a map lookup with key type std::string?

查看:271
本文介绍了如何在key类型std :: string的地图查找中提高性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用std :: map(VC ++实现),它通过地图的find方法查找有点慢。



键类型是std :: string。



我可以提高这个std ::地图查找通过自定义键比较覆盖的地图?例如,也许std :: string<比较在比较数据之前不考虑一个简单的字符串:: size()比较?



任何其他想法以加快比较?



在我的情况下,地图将始终包含< 15个元素,但它被查询非停止,性能至关重要。也许有一个更好的数据结构,我可以使用,会更快?



更新:地图包含文件路径。



Update2:地图的元素经常变化。

解决方案

首先,关闭所有的profiling和DEBUG开关。这些可以大大减慢STL。



如果这不是它,问题的一部分可能是你的字符串是相同的第一个80-90%的字符串。这对地图来说并不坏,但必须是字符串比较。如果是这种情况,您的搜索可能需要更长的时间。



例如,在这段代码中,find()可能会导致几个字符串比较,但每个都会在比较第一个字符直到david之后返回,然后将检查前三个字符。因此,每次调用最多只能检查5个字符。

  map< string,int&名称; 
names [larry] = 1;
names [david] = 2;
names [juanita] = 3;

map< string,int> :: iterator iter = names.find(daniel);另一方面,在下面的代码中,find()可能会检查135个以上的字符:

  map< string,int>名称; 
names [/ usr / local / lib / fancy-pants / share / etc / doc / foobar / longpath / yadda / yadda / wilma] = 1;
names [/ usr / local / lib / fancy-pants / share / etc / doc / foobar / longpath / yadda / yadda / fred] = 2;
names [/ usr / local / lib / fancy-pants / share / etc / doc / foobar / longpath / yadda / yadda / barney] = 3;

map< string,int> :: iterator iter = names.find(/ usr / local / lib / fancy-pants / share / etc / doc / foobar / longpath / yadda / yadda / betty );

这是因为字符串比较必须更深地搜索才能找到匹配,因为每个字符串的开头是



由于你的数据集太小,在比较中使用size()可以帮助你。 std :: map保持排序,所以它的元素可以用二分搜索进行搜索。每次调用find都会导致小于5的字符串比较,以及2次比较的平均值。但它确实取决于你的数据。如果大多数路径字符串的长度不同,那么像Motti这样的大小检查可以帮助很多。



当考虑替代算法时,需要考虑多少命中你得到。你的大多数find()调用返回end()还是命中?如果大多数find()返回end()(misses),那么你每次都在搜索整个地图(2logn字符串比较)。



Hash_map是一个好主意;它应该减少你的搜索时间大约一半的命中;更多的错过。



由于路径字符串的性质,可能会调用自定义算法,特别是如果您的数据集具有与上述代码中相同的祖先。



另一件要考虑的事情是如何获得搜索字符串。如果你重用它们,它可能有助于将它们编码为更容易比较的东西。如果你使用它们一次并丢弃它们,那么这个编码步骤可能太贵了。



我使用一个像霍夫曼编码树一样(很久以前)字符串搜索。这样的二进制字符串搜索树在某些情况下可能会更有效,但对于像你这样的小集合来说它相当昂贵。



最后,查看另一个std :: map实现。我听说过有关VC的stl代码性能的坏话。 DEBUG库特别不好在每次调用检查你。 StlPort 过去是一个很好的选择,但我几年没有尝试过。我也一直喜欢 Boost


I'm using an std::map (VC++ implementation) and it's a little slow for lookups via the map's find method.

The key type is an std::string.

Can I increase the performance of this std::map lookup via a custom key compare override for the map? For example, maybe std::string < compare doesn't take into consideration a simple string::size() compare before comparing its data?

Any other ideas to speed up the compare?

In my situation the map will always contain < 15 elements, but it is being queried non stop and performance is critical. Maybe there is a better data structure that I can use that would be faster?

Update: The map contains file paths.

Update2: The map's elements are changing often.

解决方案

First, turn off all the profiling and DEBUG switches. These can slow down STL immensely.

If that's not it, part of the problem may be that your strings are identical for the first 80-90% of the string. This isn't bad for map, necessarily, but it is for string comparisons. If this is the case, your search can take much longer.

For example, in this code find() will likely result in a couple of string compares, but each will return after comparing the first character until "david", and then the first three characters will be checked. So at most, 5 characters will be checked per call.

map<string,int> names;
names["larry"] = 1;
names["david"] = 2;
names["juanita"] = 3;

map<string,int>::iterator iter = names.find("daniel");

On the other hand, in the following code, find() will likely check 135+ characters:

map<string,int> names;
names["/usr/local/lib/fancy-pants/share/etc/doc/foobar/longpath/yadda/yadda/wilma"] = 1;
names["/usr/local/lib/fancy-pants/share/etc/doc/foobar/longpath/yadda/yadda/fred"] = 2;
names["/usr/local/lib/fancy-pants/share/etc/doc/foobar/longpath/yadda/yadda/barney"] = 3;

map<string,int>::iterator iter = names.find("/usr/local/lib/fancy-pants/share/etc/doc/foobar/longpath/yadda/yadda/betty");

That's because the string comparisons have to search deeper to find a match since the beginning of each string is the same.

Using size() in your comparison for equality won't help you much here since your data set is so small. A std::map is kept sorted so its elements can be searched with a binary search. Each call to find should result in less than 5 string comparisons for a miss, and an average of 2 comparisons for a hit. But it does depend on your data. If most of your path strings are of different lengths, then a size check like Motti describes could help a lot.

Something to consider when thinking of alternative algorithms is how many many "hits" you get. Are most of your find() calls returning end() or a hit? If most of your find()s return end() (misses) then you are searching the entire map every time (2logn string compares).

Hash_map is a good idea; it should cut your search time in about half for hits; more for misses.

A custom algorithm may be called for because of the nature of path strings, especially if your data set has common ancestry like in the above code.

Another thing to consider is how you get your search strings. If you are reusing them, it may help to encode them into something that is easier to compare. If you use them once and discard them, then this encoding step is probably too expensive.

I used something like a Huffman coding tree once (a long time ago) to optimize string searches. A binary string search tree like that may be more efficient in some cases, but its pretty expensive for small sets like yours.

Finally, look into alternative std::map implementations. I've heard bad things about some of VC's stl code performance. The DEBUG library in particular is bad about checking you on every call. StlPort used to be a good alternative, but I haven't tried it in a few years. I've always loved Boost too.

这篇关于如何在key类型std :: string的地图查找中提高性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆