整数散列函数与精度的字符串 [英] String to Integer Hashing Function with Precision

查看：97 发布时间：2018/6/1 18:36:46 c++ hash

本文介绍了整数散列函数与精度的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将一个char数组散列到一个int或一个long。结果值必须遵守给定的精度值。
我一直在使用的函数如下所示：

$ $ $ $ $ $ c $ int $ GetHash（const char * zKey，int iPrecision / * = 6 * /）
{
///// FROM：http://courses.cs.vt.edu/~cs2604/spring02/Projects/4/elfhash.cpp

unsigned long h = 0;
long M = pow（10，iPrecision）; （* zKey）
{
h =（h <4）+ * zKey ++;

;
unsigned long g = h& 0xF0000000L;
if（g）h ^ = g>> 24;
h& =〜〜g;
}

return（int）（h％M）;
}

要被散列的字符串与SAEUI1210.00000010_1类似。 p>

但是，在某些情况下会产生重复值。
是否有任何好的替代方法，不会为不同的字符串值重复相同的散列值。

的散列值是因为散列值范围小于散列数据的空间，所以它会为某些值产生重复值。理论上，一个32位散列有足够的范围来散列所有〜6个字符的字符串（AZ，az，0-9），而不会导致冲突。在实践中，哈希不是输入的完美排列。给定一个32位散列，由于生日悖论。

给定一组静态数据值，总是可以构造一个专门为它们设计的哈希函数，它永远不会与自身发生冲突（当然，它的输出大小至少是 log（| data set |）），但它需要你提前知道所有可能的数据值。被称为完美哈希。

据说，这里是一些应该让你开始的选择（他们是旨在尽量减少碰撞）

I want to hash a char array in to an int or a long. The resulting value has to adhere to a given precision value. The function I've been using is given below:

int GetHash(const char* zKey, int iPrecision /*= 6*/)
{
        /////FROM : http://courses.cs.vt.edu/~cs2604/spring02/Projects/4/elfhash.cpp

        unsigned long h = 0;
        long M = pow(10, iPrecision);

        while(*zKey)
        {
                h = (h << 4) + *zKey++;
                unsigned long g = h & 0xF0000000L;
                if (g) h ^= g >> 24;
                h &= ~g;
        }            

        return (int) (h % M);
}

The string to be hashed is similar to "SAEUI1210.00000010_1".

However, this produces duplicate values in some cases. Are there any good alternatives which wouldn't duplicate the same hash for different string values.

解决方案

The very definition of a hash is that it produces duplicate values for some values, due to hash value range being smaller than the space of the hashed data.

In theory, a 32-bit hash has enough range to hash all ~6 character strings (A-Z,a-z,0-9 only), without causing a collision. In practice, hashes are not a perfect permutation of the input. Given a 32-bit hash, you can expect to get hash collisions after hashing ~16 bit of random inputs, due to the birthday paradox.

Given a static set of data values, it's always possible to construct a hash function designed specifically for them, which will never collide with itself (of course, size of its output will be at least log(|data set|). However, it requires you to know all the possible data values ahead of time. This is called perfect hashing.

That being said, here are a few alternatives which should get you started (they are designed to minimize collisions)

这篇关于整数散列函数与精度的字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

整数散列函数与精度的字符串 [英] String to Integer Hashing Function with Precision

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

整数散列函数与精度的字符串 [英] String to Integer Hashing Function with Precision

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭