sparse_hash_map对于特定数据非常慢 [英] sparse_hash_map is very slow for specific data

查看:832
本文介绍了sparse_hash_map对于特定数据非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tl; dr:为什么 sparse_hash_map 中的键查找速度比特定数据慢约50倍?






我正在测试Google的sparsehash中 sparse_hash_map 键查找速度库使用了一个非常简单的Cython包装我写了。 hashtable包含 uint32_t 键和 uint16_t 值。对于随机键,值和查询,我得到超过1M查找/秒。但是,对于特定的数据,我需要的性能几乎不超过20k查找/秒。



包装器是



sparsehash gcc 这里似乎使用琐碎散列即 x 散列到 x )。



有什么明显的,可能导致此行为除了哈希冲突吗?从我发现的,在Cython包装器中集成自定义哈希函数(即重载 std :: hash< uint32_t> )是非常不重要的。

解决方案

我找到了一个有效的解决方案,但它不漂亮。



sparsehash_wrapper.cpp

  #includesparsehash / sparse_hash_map
#includestdint.h

//从
借用的语法// https://stackoverflow.com/questions/14094104/google-sparse-hash-with-murmur -hash-function

struct UInt32Hasher {
size_t operator()(const uint32_t& x)const {
return(x ^(x <17)^ > 13)+ 3238229671);
}
};

template< class Key,class T>
class sparse_hash_map:public google :: sparse_hash_map< Key,T,UInt32Hasher> {};

这是一个自定义哈希函数,我可以在现有的包装器中集成最少的代码更改:不得不使用Cython .pxd中的 sparsehash_wrapper.cpp 路径替换 sparsehash / sparse_hash_map 文件。到目前为止,唯一的问题是 pyximport 找不到 sparsehash_wrapper.cpp ,除非我指定完整的绝对路径 .pxd



问题确实与碰撞有关:在重新创建与< c $ c> 17.shash 从头开始创建空映射并插入中的每个(键,值)对17.shash 进入它),性能上升到1M + req / sec。


tl;dr: why does key lookup in sparse_hash_map become about 50x slower for specific data?


I am testing the speed of key lookups for sparse_hash_map from Google's sparsehash library using a very simple Cython wrapper I've written. The hashtable contains uint32_t keys and uint16_t values. For random keys, values and queries I am getting more than 1M lookups/sec. However, for the specific data I need the performance barely exceeds 20k lookups/sec.

The wrapper is here. The table which runs slowly is here. The benchmarking code is:

benchmark.pyx:

from sparsehash cimport SparseHashMap
from libc.stdint cimport uint32_t
from libcpp.vector cimport vector
import time
import numpy as np

def fill_randomly(m, size):
    keys = np.random.random_integers(0, 0xFFFFFFFF, size)
    # 0 is a special domain-specific value
    values = np.random.random_integers(1, 0xFFFF, size)
    for j in range(size):
        m[keys[j]] = values[j]

def benchmark_get():
    cdef int dummy
    cdef uint32_t i, j, table_key
    cdef SparseHashMap m
    cdef vector[uint32_t] q_keys
    cdef int NUM_QUERIES = 1000000
    cdef uint32_t MAX_REQUEST = 7448 * 2**19 - 1  # this is domain-specific

    time_start = time.time()

    ### OPTION 1 ###
    m = SparseHashMap('17.shash')

    ### OPTION 2 ###
    # m = SparseHashMap(16130443)
    # fill_randomly(m, 16130443)

    q_keys = np.random.random_integers(0, MAX_REQUEST, NUM_QUERIES)

    print("Initialization: %.3f" % (time.time() - time_start))

    dummy = 0

    time_start = time.time()

    for i in range(NUM_QUERIES):
        table_key = q_keys[i]
        dummy += m.get(table_key)
        dummy %= 0xFFFFFF  # to prevent overflow error

    time_elapsed = time.time() - time_start

    if dummy == 42:
        # So that the unused variable is not optimized away
        print("Wow, lucky!")

    print("Table size: %d" % len(m))
    print("Total time: %.3f" % time_elapsed)
    print("Seconds per query: %.8f" % (time_elapsed / NUM_QUERIES))
    print("Queries per second: %.1f" % (NUM_QUERIES / time_elapsed))

def main():
    benchmark_get()

benchmark.pyxbld (because pyximport should compile in C++ mode):

def make_ext(modname, pyxfilename):
    from distutils.extension import Extension
    return Extension(
        name=modname,
        sources=[pyxfilename],
        language='c++'
    )

run.py:

import pyximport
pyximport.install()

import benchmark
benchmark.main()

The results for 17.shash are:

Initialization: 2.612
Table size: 16130443
Total time: 48.568
Seconds per query: 0.00004857
Queries per second: 20589.8

and for random data:

Initialization: 25.853
Table size: 16100260
Total time: 0.891
Seconds per query: 0.00000089
Queries per second: 1122356.3

The key distribution in 17.shash is this (plt.hist(np.fromiter(m.keys(), dtype=np.uint32, count=len(m)), bins=50)):

From the documentation on sparsehash and gcc it seems that trivial hashing is used here (that is, x hashes to x).

Is there anything obvious that could be causing this behavior besides hash collisions? From what I have found, it is non-trivial to integrate a custom hash function (i.e. overload std::hash<uint32_t>) in a Cython wrapper.

解决方案

I have found a solution that works, but it ain't pretty.

sparsehash_wrapper.cpp:

#include "sparsehash/sparse_hash_map"
#include "stdint.h"

// syntax borrowed from
// https://stackoverflow.com/questions/14094104/google-sparse-hash-with-murmur-hash-function

struct UInt32Hasher {
    size_t operator()(const uint32_t& x) const {
        return (x ^ (x << 17) ^ (x >> 13) + 3238229671);
    }    
};

template<class Key, class T>
class sparse_hash_map : public google::sparse_hash_map<Key, T, UInt32Hasher> {};

This is a custom hash function which I could integrate in the existing wrapper with minimal code changes: I only had to replace sparsehash/sparse_hash_map with the path to sparsehash_wrapper.cpp in the Cython .pxd file. So far, the only problem with it is that pyximport cannot find sparsehash_wrapper.cpp unless I specify the full absolute path in .pxd.

The problem was indeed with collisions: after recreating a hash map with the same contents as 17.shash from scratch (that it, creating an empty map and inserting every (key, value) pair from 17.shash into it), the performance went up to 1M+ req/sec.

这篇关于sparse_hash_map对于特定数据非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆