sparse_hash_map对于特定数据非常慢 [英] sparse_hash_map is very slow for specific data
问题描述
tl; dr:为什么 sparse_hash_map
中的键查找速度比特定数据慢约50倍?
我正在测试Google的sparsehash中 sparse_hash_map
的键查找速度库使用了一个非常简单的Cython包装我写了。 hashtable包含 uint32_t
键和 uint16_t
值。对于随机键,值和查询,我得到超过1M查找/秒。但是,对于特定的数据,我需要的性能几乎不超过20k查找/秒。
包装器是
从 sparsehash
和 gcc
这里似乎使用琐碎散列即 x
散列到 x
)。
有什么明显的,可能导致此行为除了哈希冲突吗?从我发现的,在Cython包装器中集成自定义哈希函数(即重载 std :: hash< uint32_t>
)是非常不重要的。
我找到了一个有效的解决方案,但它不漂亮。
sparsehash_wrapper.cpp
:
#includesparsehash / sparse_hash_map
#includestdint.h
//从
借用的语法// https://stackoverflow.com/questions/14094104/google-sparse-hash-with-murmur -hash-function
struct UInt32Hasher {
size_t operator()(const uint32_t& x)const {
return(x ^(x <17)^ > 13)+ 3238229671);
}
};
template< class Key,class T>
class sparse_hash_map:public google :: sparse_hash_map< Key,T,UInt32Hasher> {};
这是一个自定义哈希函数,我可以在现有的包装器中集成最少的代码更改:不得不使用Cython .pxd中的
文件。到目前为止,唯一的问题是 sparsehash_wrapper.cpp
路径替换 sparsehash / sparse_hash_map
pyximport
找不到 sparsehash_wrapper.cpp
,除非我指定完整的绝对路径 .pxd
。
问题确实与碰撞有关:在重新创建与< c $ c> 17.shash 从头开始创建空映射并插入中的每个(键,值)对17.shash
进入它),性能上升到1M + req / sec。
tl;dr: why does key lookup in sparse_hash_map
become about 50x slower for specific data?
I am testing the speed of key lookups for sparse_hash_map
from Google's sparsehash library using a very simple Cython wrapper I've written. The hashtable contains uint32_t
keys and uint16_t
values. For random keys, values and queries I am getting more than 1M lookups/sec. However, for the specific data I need the performance barely exceeds 20k lookups/sec.
The wrapper is here. The table which runs slowly is here. The benchmarking code is:
benchmark.pyx
:
from sparsehash cimport SparseHashMap
from libc.stdint cimport uint32_t
from libcpp.vector cimport vector
import time
import numpy as np
def fill_randomly(m, size):
keys = np.random.random_integers(0, 0xFFFFFFFF, size)
# 0 is a special domain-specific value
values = np.random.random_integers(1, 0xFFFF, size)
for j in range(size):
m[keys[j]] = values[j]
def benchmark_get():
cdef int dummy
cdef uint32_t i, j, table_key
cdef SparseHashMap m
cdef vector[uint32_t] q_keys
cdef int NUM_QUERIES = 1000000
cdef uint32_t MAX_REQUEST = 7448 * 2**19 - 1 # this is domain-specific
time_start = time.time()
### OPTION 1 ###
m = SparseHashMap('17.shash')
### OPTION 2 ###
# m = SparseHashMap(16130443)
# fill_randomly(m, 16130443)
q_keys = np.random.random_integers(0, MAX_REQUEST, NUM_QUERIES)
print("Initialization: %.3f" % (time.time() - time_start))
dummy = 0
time_start = time.time()
for i in range(NUM_QUERIES):
table_key = q_keys[i]
dummy += m.get(table_key)
dummy %= 0xFFFFFF # to prevent overflow error
time_elapsed = time.time() - time_start
if dummy == 42:
# So that the unused variable is not optimized away
print("Wow, lucky!")
print("Table size: %d" % len(m))
print("Total time: %.3f" % time_elapsed)
print("Seconds per query: %.8f" % (time_elapsed / NUM_QUERIES))
print("Queries per second: %.1f" % (NUM_QUERIES / time_elapsed))
def main():
benchmark_get()
benchmark.pyxbld
(because pyximport
should compile in C++ mode):
def make_ext(modname, pyxfilename):
from distutils.extension import Extension
return Extension(
name=modname,
sources=[pyxfilename],
language='c++'
)
run.py
:
import pyximport
pyximport.install()
import benchmark
benchmark.main()
The results for 17.shash
are:
Initialization: 2.612
Table size: 16130443
Total time: 48.568
Seconds per query: 0.00004857
Queries per second: 20589.8
and for random data:
Initialization: 25.853
Table size: 16100260
Total time: 0.891
Seconds per query: 0.00000089
Queries per second: 1122356.3
The key distribution in 17.shash
is this (plt.hist(np.fromiter(m.keys(), dtype=np.uint32, count=len(m)), bins=50)
):
From the documentation on sparsehash
and gcc
it seems that trivial hashing is used here (that is, x
hashes to x
).
Is there anything obvious that could be causing this behavior besides hash collisions? From what I have found, it is non-trivial to integrate a custom hash function (i.e. overload std::hash<uint32_t>
) in a Cython wrapper.
I have found a solution that works, but it ain't pretty.
sparsehash_wrapper.cpp
:
#include "sparsehash/sparse_hash_map"
#include "stdint.h"
// syntax borrowed from
// https://stackoverflow.com/questions/14094104/google-sparse-hash-with-murmur-hash-function
struct UInt32Hasher {
size_t operator()(const uint32_t& x) const {
return (x ^ (x << 17) ^ (x >> 13) + 3238229671);
}
};
template<class Key, class T>
class sparse_hash_map : public google::sparse_hash_map<Key, T, UInt32Hasher> {};
This is a custom hash function which I could integrate in the existing wrapper with minimal code changes: I only had to replace sparsehash/sparse_hash_map
with the path to sparsehash_wrapper.cpp
in the Cython .pxd
file. So far, the only problem with it is that pyximport
cannot find sparsehash_wrapper.cpp
unless I specify the full absolute path in .pxd
.
The problem was indeed with collisions: after recreating a hash map with the same contents as 17.shash
from scratch (that it, creating an empty map and inserting every (key, value) pair from 17.shash
into it), the performance went up to 1M+ req/sec.
这篇关于sparse_hash_map对于特定数据非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!