是gcc std :: unordered_map实现慢吗?如果是这样 - 为什么? [英] Is gcc std::unordered_map implementation slow? If so - why?

查看:475
本文介绍了是gcc std :: unordered_map实现慢吗?如果是这样 - 为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用C ++开发高性能关键软件。在那里我们需要一个并发的哈希映射并实现一个。因此,我们写了一个基准测试,我们的并发哈希映射与 std :: unordered_map 比较有多慢。

We are developing a highly performance critical software in C++. There we need a concurrent hash map and implemented one. So we wrote a benchmark to figure out, how much slower our concurrent hash map is compared with std::unordered_map.

但是, std :: unordered_map 似乎是非常慢的...所以这是我们的微基准(对于并发映射我们产生了一个新的线程,以确保锁定没有得到优化,注意,我从来没有插入0,因为我也用 google :: dense_hash_map ,需要一个空值)基准:

But, std::unordered_map seems to be incredibly slow... So this is our micro-benchmark (for the concurrent map we spawned a new thread to make sure that locking does not get optimized away and note that I never inser 0 because I also benchmark with google::dense_hash_map, which needs a null value):

boost::random::mt19937 rng;
boost::random::uniform_int_distribution<> dist(std::numeric_limits<uint64_t>::min(), std::numeric_limits<uint64_t>::max());
std::vector<uint64_t> vec(SIZE);
for (int i = 0; i < SIZE; ++i) {
    uint64_t val = 0;
    while (val == 0) {
        val = dist(rng);
    }
    vec[i] = val;
}
std::unordered_map<int, long double> map;
auto begin = std::chrono::high_resolution_clock::now();
for (int i = 0; i < SIZE; ++i) {
    map[vec[i]] = 0.0;
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - begin);
std::cout << "inserts: " << elapsed.count() << std::endl;
std::random_shuffle(vec.begin(), vec.end());
begin = std::chrono::high_resolution_clock::now();
long double val;
for (int i = 0; i < SIZE; ++i) {
    val = map[vec[i]];
}
end = std::chrono::high_resolution_clock::now();
elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - begin);
std::cout << "get: " << elapsed.count() << std::endl;

(编辑:整个源代码可以在这里找到:http://pastebin.com/vPqf7eya

( the whole source code can be found here: http://pastebin.com/vPqf7eya)

std :: unordered_map 是:

inserts: 35126
get    : 2959

google :: dense_map

inserts: 3653
get    : 816

对于我们的手支持的并发映射(它锁定,虽然基准是单线程,但在一个单独的spawn线程):

For our hand backed concurrent map (which does locking, although the benchmark is single threaded - but in a separate spawn thread):

inserts: 5213
get    : 2594



如果我编译的基准程序没有pthread支持和运行主线程中的一切,我得到以下结果为我们的手支持并发映射:

If I compile the benchmark program without pthread support and run everything in the main thread, I get the following results for our hand backed concurrent map:

inserts: 4441
get    : 1180



使用以下命令编译:

I compile with the following command:

g++-4.7 -O3 -DNDEBUG -I/tmp/benchmap/sparsehash-2.0.2/src/ -std=c++11 -pthread main.cc

所以特别插入 std :: unordered_map 非常昂贵 - 35秒对其他地图的3-5秒。此外,查找时间似乎相当高。

So especially inserts on std::unordered_map seem to be extremely expensive - 35 seconds vs 3-5 seconds for other maps. Also the lookup time seems to be quite high.

我的问题:为什么?我读了另一个问题stackoverflow哪里有人问,为什么 std :: tr1 :: unordered_map 比他自己的实现慢。评分最高的答案表明, std :: tr1 :: unordered_map 需要实现一个更复杂的接口。但我不能看到这个参数:我们在我们的concurrent_map中使用桶方法, std :: unordered_map 也使用桶方法( google :: dense_hash_map 不会,但是 std :: unordered_map 应该至少像我们的手背并发安全版本一样快?除此之外,我不能看到任何东西强制一个功能,使得哈希映射执行不好...

My question: why is this? I read another question on stackoverflow where someone asks, why std::tr1::unordered_map is slower than his own implementation. There the highest rated answer states, that the std::tr1::unordered_map needs to implement a more complicated interface. But I can not see this argument: we use a bucket approach in our concurrent_map, std::unordered_map uses a bucket-approach too (google::dense_hash_map does not, but than std::unordered_map should be at least as fast than our hand backed concurrency-safe version?). Apart from that I can not see anything in the interface that force a feature which makes the hash map perform badly...

所以我的问题:是真的, std :: unordered_map 似乎很慢?如果没有:什么是错误?如果是的:是什么原因。

So my question: is it true that std::unordered_map seems to be very slow? If no: what is wrong? If yes: what is the reason for that.

我的主要问题:为什么插入一个值到 std :: unordered_map 这么可怕的昂贵(即使我们在开始时保留足够的空间,它不会表现得更好 - 所以rehashing似乎不是问题)。

And my main question: why is inserting a value into a std::unordered_map so terrible expensive (even if we reserve enough space at the beginning, it does not perform much better - so rehashing seems not to be the problem)?

首先:是的,提出的基准是不完美的 - 这是因为我们玩了很多,它只是一个黑客(例如 uint64 分配生成ints在实践中不是一个好主意,排除0在一个循环是一种愚蠢的等...)。

First of all: yes the presented benchmark is not flawless - this is because we played around a lot with it and it is just a hack (for example the uint64 distribution to generate ints would in practice not be a good idea, exclude 0 in a loop is kind of stupid etc...).

目前大多数的注释解释,我可以通过预分配足够的空间使unordered_map更快。在我们的应用程序中,这是不可能的:我们正在开发一个数据库管理系统,需要一个哈希映射在事务期间存储一些数据(例如锁定信息)。所以这个映射可以是从1(用户只进行一次插入和提交)到数十亿条目(如果发生全表扫描)。这是不可能预先分配足够的空间在这里(只是分配很多在开始将消耗太多的内存)。

At the moment most comments explain, that I can make the unordered_map faster by preallocating enough space for it. In our application this is just not possible: we are developing a database management system and need a hash map to store some data during a transaction (for example locking information). So this map can be everything from 1 (user just makes one insert and commits) to billions of entries (if full table scans happen). It is just impossible to preallocate enough space here (and just allocate a lot in the beginning will consume too much memory).

此外,我道歉,我没有状态我的问题很清楚:我不是真的有兴趣快速制作unordered_map(使用Google的密集哈希映射工作正常为我们),我只是不真正理解这个巨大的性能差异来自哪里。它不能只是预分配(即使有足够的预分配内存,密集映射比unordered_map快一个数量级,我们的手支持的并发映射以大小为64的数组开始 - 因此比unordered_map小一个)。

Furthermore, I apologize, that I did not state my question clear enough: I am not really interested in making unordered_map fast (using googles dense hash map works fine for us), I just don't really understand where this huge performance differences come from. It can not be just preallocation (even with enough preallocated memory, the dense map is an order of magnitude faster than unordered_map, our hand backed concurrent map starts with an array of size 64 - so a smaller one than unordered_map).

那么 std :: unordered_map 的这种不良性能的原因是什么?或者不同的问题:可以写一个实现 std :: unordered_map 接口,这是标准一致和(几乎)一样快googles密集哈希映射?

So what is the reason for this bad performance of std::unordered_map? Or differently asked: Could one write an implementation of the std::unordered_map interface which is standard conform and (nearly) as fast as googles dense hash map? Or is there something in the standard that enforces the implementer to chose an inefficient way to implement it?

通过分析,我发现大量的时间用于整数除法。 std :: unordered_map 对数组大小使用素数,而其他实现使用2的幂。为什么 std :: unordered_map 使用素数?如果哈希是坏的,执行更好?

By profiling I see that a lot of time is used for integer divions. std::unordered_map uses prime numbers for the array size, while the other implementations use powers of two. Why does std::unordered_map use prime-numbers? To perform better if the hash is bad? For good hashes it does imho make no difference.

这些是 std :: map

inserts: 16462
get    : 16978

Sooooooo:为什么要插入 std :: map 比插入到 std :: unordered_map 更快...我的意思是WAT? std :: map 有一个更糟糕的局部性(树对阵列),需要做更多的分配(每个插入vs每个rehash +加上〜1每个碰撞) :有另一个算法的复杂性(O(logn)vs O(1))!

Sooooooo: why are inserts into a std::map faster than inserts into a std::unordered_map... I mean WAT? std::map has a worse locality (tree vs array), needs to make more allocations (per insert vs per rehash + plus ~1 for each collision) and, most important: has another algorithmic complexity (O(logn) vs O(1))!

推荐答案

我发现原因: gcc-4.7的问题!

I found the reason: it is a Problem of gcc-4.7!!

gcc-4.7

inserts: 37728
get    : 2985

gcc-4.6

With gcc-4.6

inserts: 2531
get    : 1565

因此,gcc-4.7中的 std :: unordered_map 是在Ubuntu上安装gcc-4.7.0,另一个安装是在debian测试中使用gcc 4.7.1)。

So std::unordered_map in gcc-4.7 is broken (or my installation, which is an installation of gcc-4.7.0 on Ubuntu - and another installation which is gcc 4.7.1 on debian testing).

我将提交一个错误报告..然后:不要使用 std :: unordered_map 与gcc 4.7!

I will submit a bug report.. until then: DO NOT use std::unordered_map with gcc 4.7!

这篇关于是gcc std :: unordered_map实现慢吗?如果是这样 - 为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆