字符串的哈希函数 [英] Hash function for a string

查看:182
本文介绍了字符串的哈希函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们目前正在处理我的类中的哈希函数。我们的教师要求我们在互联网上使用哈希函数来比较我们在代码中使用的哈希函数。

We are currently dealing with hash function in my class. Our instructor asked us to a hash function on the internet to compare to the two we have used in our code.

第一个:

int HashTable::hash (string word)   
// POST: the index of entry is returned
{       int sum = 0;
        for (int k = 0; k < word.length(); k++)
            sum = sum + int(word[k]);
        return  sum % SIZE; 
}

其次:

int HashTable::hash (string word)
{
   int seed = 131; 
   unsigned long hash = 0;
   for(int i = 0; i < word.length(); i++)
   {
      hash = (hash * seed) + word[i];
   }
   return hash % SIZE;
}

其中SIZE是501(散列表的大小)来自于超过20,000个字的文本文件。

Where SIZE is 501 (The size of the hash table) and the input is coming from a text file of 20,000+ words.

我看到这个问题与几个代码示例,但不完全确定要在散列函数中寻找什么。如果我理解正确,在我的情况下,一个哈希接受一个输入(字符串),并进行数学计算来分配字符串一个数字,并将其插入表中。这个过程是为了提高搜索列表的速度吗?

I saw this question with a few code examples but wasn't exactly sure what to be looking for in a hash function. If I understand correctly, in my case, a hash takes an input (string) and does a math calculation to assign the string a number and inserts it in a table. This process is done to increase the speed of searching the list?

如果我的逻辑是声音,有没有人有一个好的示例或资源显示不同的哈希函数涉及字符串?甚至是编写我自己的高效哈希函数的过程。

If my logic is sound, does anyone have a good example or a resource showing a different hash function that involves a string? Or even the process of writing my own efficient hash function.

推荐答案

首先,在实践中通常并不重要。大多数散列函数都是够好。

First, it usually does not matter that much in practice. Most hash functions are "good enough".

但是如果你真的关心,你应该知道它是一个研究主题。有几千篇论文。你今天仍然可以通过学习&设计散列算法。

But if you really care, you should know that it is a research subject by itself. There are thousand of papers about that. You can still get a PhD today by studying & designing hashing algorithms.

您的第二个散列函数可能会稍微好一点,因为它可能应该分隔字符串abba字符串。另一方面,它可能不如第一哈希函数快。

Your second hash function might be slightly better, because it probably should separate the string "ab" from the string "ba". On the other hand, it is probably less quick than the first hash function. It may, or may not, be relevant for your application.

我猜想用于基因组字符串的哈希函数与用于哈希家族名的哈希函数有很大的不同。电话数据库。或许甚至一些字符串哈希函数更适合德语,而不是英语或法语单词。

I'll guess that hash functions used for genome strings are quite different than those used to hash family names in telephone databases. Perhaps even some string hash functions are better suited for German, than for English or French words.

许多软件库给你足够好的哈希函数,例如。 Qt拥有 qhash ,而C ++ 11具有 std :: hash in < functional> ,Glib在C中有多个散列函数,并且 POCO 有一些散列函数。

Many software libraries give you good enough hash functions, e.g. Qt has qhash, and C++11 has std::hash in <functional>, Glib has several hash functions in C, and POCO has some hash function.

我经常有哈希函数涉及素数(参见Bézout的身份)和xor,例如

I quite often have hashing functions involving primes (see Bézout's identity) and xor, like e.g.

#define A 54059 /* a prime */
#define B 76963 /* another prime */
#define C 86969 /* yet another prime */
#define FIRSTH 37 /* also prime */
unsigned hash_str(const char* s)
{
   unsigned h = FIRSTH;
   while (*s) {
     h = (h * A) ^ (s[0] * B);
     s++;
   }
   return h; // or return h % C;
}

但我不声称自己是哈希专家。当然, A B C FIRSTH 应该最好是素数,但你可以选择其他素数。

But I don't claim to be an hash expert. Of course, the values of A, B, C, FIRSTH should preferably be primes, but you could have chosen other prime numbers.

看一些 MD5 实现,以了解什么哈希函数可以。

Look at some MD5 implementation to get a feeling of what hash functions can be.

关于算法的最好的书至少有一整章专门用于哈希。从散列函数开始使用wikipages。 哈希表

Most good books on algorithmics have at least a whole chapter dedicated to hashing. Start with wikipages on hash function & hash table.

这篇关于字符串的哈希函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆