什么时候使用std :: multimap有意义 [英] When does using a std::multimap make sense
问题描述
我目前正在试验一些stl数据结构的使用。但是我仍然不确定什么时候使用哪一个和什么时候使用某种组合。目前我想弄清楚,当使用 std :: multimap
确实有意义。就我所见,人们可以通过组合 std :: map
和 std :: vector $ c $来轻松构建自己的多重映射实现c>。所以我留下的问题,当这些数据结构中的每一个都应该使用。
I am currently experimenting on some usage of stl-datastructures. However I am still not sure when to use which one and when to use a certain combination. Currently I am trying to figure out, when using a std::multimap
does make sense. As far as I can see, one can easily build ones own multimap implementation by combining std::map
and std::vector
. So I am left with the question when each of these datastructures should be used.
- 简单:std :: multimap绝对更容易使用,因为不必处理额外的嵌套。但是,对一系列元素的访问可能需要将数据从迭代器复制到另一个数据结构(例如
std :: vector
)。 - 速度:向量的局部性最可能使得在等于元素的范围上的迭代更快,因为高速缓存使用被优化。但我猜测,
std :: multimaps
也有很多优化技巧背后,使迭代相等的元素尽可能快。也可以针对std :: multimaps
优化正确的元素范围。
- Simplicity: A std::multimap is definitely simpler to use, because one does not have to handle the additional nesting. However access to a range of elements as a bulk one might need to copy the data from the iterators to another datastructure (for example a
std::vector
). - Speed: The locality of the vector most likely makes iterating over the range of equal element much faster, because the cache usage is optimized. However I am guessing that
std::multimaps
also have a lot of optimization tricks behind the back to make iterating over equal elements as fast as possible. Also getting to the correct element-range might probably be optimized forstd::multimaps
.
为了尝试速度问题,我使用以下程序做了一些简单的比较:
In order to try out the speed issues I did some simple comparisons using the following program:
#include <stdint.h>
#include <iostream>
#include <map>
#include <vector>
#include <utility>
typedef std::map<uint32_t, std::vector<uint64_t> > my_mumap_t;
const uint32_t num_partitions = 100000;
const size_t num_elements = 500000;
int main() {
srand( 1337 );
std::vector<std::pair<uint32_t,uint64_t>> values;
for( size_t i = 0; i <= num_elements; ++i ) {
uint32_t key = rand() % num_partitions;
uint64_t value = rand();
values.push_back( std::make_pair( key, value ) );
}
clock_t start;
clock_t stop;
{
start = clock();
std::multimap< uint32_t, uint64_t > mumap;
for( auto iter = values.begin(); iter != values.end(); ++iter ) {
mumap.insert( *iter );
}
stop = clock();
std::cout << "Filling std::multimap: " << stop - start << " ticks" << std::endl;
std::vector<uint64_t> sums;
start = clock();
for( uint32_t i = 0; i <= num_partitions; ++i ) {
uint64_t sum = 0;
auto range = mumap.equal_range( i );
for( auto iter = range.first; iter != range.second; ++iter ) {
sum += iter->second;
}
sums.push_back( sum );
}
stop = clock();
std::cout << "Reading std::multimap: " << stop - start << " ticks" << std::endl;
}
{
start = clock();
my_mumap_t mumap;
for( auto iter = values.begin(); iter != values.end(); ++iter ) {
mumap[ iter->first ].push_back( iter->second );
}
stop = clock();
std::cout << "Filling my_mumap_t: " << stop - start << " ticks" << std::endl;
std::vector<uint64_t> sums;
start = clock();
for( uint32_t i = 0; i <= num_partitions; ++i ) {
uint64_t sum = 0;
auto range = std::make_pair( mumap[i].begin(), mumap[i].end() );
for( auto iter = range.first; iter != range.second; ++iter ) {
sum += *iter;
}
sums.push_back( sum );
}
stop = clock();
std::cout << "Reading my_mumap_t: " << stop - start << " ticks" << std::endl;
}
}
由于我怀疑这主要取决于 num_partitions
和 num_elements
,所以我还是在这里。下面是一些示例输出:
As I suspected it depends mainly on the ratio between num_partitions
and num_elements
, so I am still at a loss here. Here are some example outputs:
对于 num_partitions = 100000
和 num_elements = 1000000
For num_partitions = 100000
and num_elements = 1000000
Filling std::multimap: 1440000 ticks
Reading std::multimap: 230000 ticks
Filling my_mumap_t: 1500000 ticks
Reading my_mumap_t: 170000 ticks
$ c> num_partitions = 100000 和 num_elements = 500000
For num_partitions = 100000
and num_elements = 500000
Filling std::multimap: 580000 ticks
Reading std::multimap: 150000 ticks
Filling my_mumap_t: 770000 ticks
Reading my_mumap_t: 140000 ticks
对于 num_partitions = 100000
和 num_elements = 200000
For num_partitions = 100000
and num_elements = 200000
Filling std::multimap: 180000 ticks
Reading std::multimap: 90000 ticks
Filling my_mumap_t: 290000 ticks
Reading my_mumap_t: 130000 ticks
$ c> num_partitions = 1000 和 num_elements = 1000000
For num_partitions = 1000
and num_elements = 1000000
Filling std::multimap: 970000 ticks
Reading std::multimap: 150000 ticks
Filling my_mumap_t: 710000 ticks
Reading my_mumap_t: 10000 ticks
我不确定如何解释这些结果。你将如何决定正确的数据结构?
I am unsure about how to interpret these results. How would you go about deciding for the correct data structure? Are there any additional constraints for the decission, which I might have missed?
推荐答案
很难判断你的基准是做什么的正确的事,所以我不能评论的数字。
It's hard to tell whether your benchmark is doing the right thing, so I can't comment on the numbers. However, a few general points:
multimap
多重映射是你很少需要的那些事情之一,但是当你需要它们时,你真的需要它们。
How is it useful: Multimaps are one of those things that you need rarely, but when you need them, you really need them.
为什么不滚自己的解决方案? 正如我所说,我不确定这些基准,但即使如果,你可以做一些不比标准容器(我怀疑)更糟糕,那么你应该考虑使其正确,测试和维护它的整体负担。想象一下,对于你写的每一行代码,你都会被征税的世界(这是Stepanov的建议)。
Why not roll my own solution? As I said, I'm not sure about those benchmarks, but even if you could make something else that isn't worse than the standard container (which I question), then you should consider the overall burden of getting it right, testing it and maintaining it. Imagine a world in which you would be taxed for every line of code you wrote (that's Stepanov's suggestion). Re-use industry-standard components whenever possible.
最后,下面是迭代多重映射的典型方式:
Finally, here's the typical way you iterate a multimap:
for (auto it1 = m.cbegin(), it2 = it1, end = m.cend(); it1 != end; it1 = it2)
{
// unique key values at this level
for ( ; it2->first == it1->first; ++it2)
{
// equal key value (`== it1->first`) at this level
}
}
这篇关于什么时候使用std :: multimap有意义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!