生成相同的唯一的哈希code所有字谜 [英] Generate same unique hash code for all anagrams

查看:190
本文介绍了生成相同的唯一的哈希code所有字谜的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我参加面试,就面临哈希冲突一个很好的问题。

Recently, I attended an interview and faced a good question regarding hash collisions.

问:给定一个字符串列表,打印出字谜一起

Question : Given a list of strings, print out the anagrams together.

例:

I / P:              {行为,神,动物,狗,猫}

O / P:                行为,猫,狗,神

Example :

i/p :              {act, god, animal, dog, cat}

o/p :                 act, cat, dog, god


我想创建HashMap和把这个词作为键和值字谜列表


I want to create hashmap and put the word as key and value as list of anagrams

要避免冲突,我想生成唯一的哈希$ C $下字谜,而不是排序,并使用排序字作为重点。

To avoid collision, I want to generate unique hash code for anagrams instead of sorting and using the sorted word as key.

我要寻找的哈希算法,取碰撞比使用链接的其他照顾。我想算法生成相同的哈希code这两种行为,猫......,这样它会在下一个单词添加到值列表

I am looking for hash algorithm which take care of collision other than using chaining. I want algorithm to generate same hash code for both act and cat... so that it will add next word to the value list

任何人都可以提出一个很好的算法?

Can anyone suggest a good algorithm ?

推荐答案

与排序的字符串散列pretty的不错,我已经做到了可能,但它确实可以是缓慢和繁琐。这里还有一个想法,不知道它是否工作 - 挑选一组素数,小到你喜欢的,大小相同的字符集,并从字符到建立一个快速映射功能。然后对给定字,映射每个字符装入匹配素,和繁殖。最后,散列使用结果

Hashing with the sorted string is pretty nice, i'd have done that probably, but it could indeed be slow and cumbersome. Here's another thought, not sure if it works - pick a set of prime numbers, as small as you like, the same size as your character set, and build a fast mapping function from your chars to that. Then for a given word, map each character into the matching prime, and multiply. finally, hash using the result.

这是非常相似,Heuster建议,只有少碰撞(实际上,我相信不会有假的冲突,考虑到任何数量的素数分解的唯一性)。

This is very similar to what Heuster suggested, only with less collisions (actually, I believe there will be no false collisions, given the uniqueness of the prime decomposition of any number).

简单的如 -

int primes[] = {2, 3, 5, 7, ...} // can be auto generated with a simple code

inline int prime_map(char c) {
    // check c is in legal char set bounds
    return primes[c - first_char];
}

...
char* word = get_next_word();
char* ptr = word;
int key = 1;
while (*ptr != NULL) {
    key *= prime_map(*ptr);
    ptr++;
}
hash[key].add_to_list(word); 

有关唯一的几​​句话 - 任意整数数量有一个单一的故障,以素数的乘积,因此给出了哈希整数键,你实际上可以重建,将散列到它,只有这些词所有可能的字符串。刚刚闯入素数,P1 ^ N1 + P2 ^ N2 + ......,每个黄金转换为匹配的字符。炭为p1的似乎N1次,并依此类推。 你不能让你没有明确使用任何新的总理,是主要意味着你不能拿其他号码的任何线性组合。

A few words about the uniqueness - any integer number has a single breakdown to multiplications of primes, so given an integer key in the hash you can actually reconstruct all possible strings that would hash to it, and only these words. Just break into primes, p1^n1+p2^n2+... and convert each prime to the matching char. the char for p1 would appear n1 times, and so on. You can't get any new prime you didn't explicitly used, being prime means you can't get it by any linear combination of other numbers.

这带来了另一个可能的改进 - 如果你能构造字符串,你只需要标记你看到填充哈希时排列。由于排列可以通过字典顺序进行排序,可以取代每一个都带有一个数字。这将节省存储实际字符串的散列的空间,但需要更多的计算所以它不一定是一个好的设计最佳的选择。尽管如此,它使原来的问题一个很好的并发症采访:)

This brings another possible improvement - if you can construct the string, you just need to mark the permutations you saw when populating the hash. since the permutations can be ordered by lexicographic order, you can replace each one with a number. This would save the space of storing the actual strings in the hash, but would require more computations so it's not necessarily a good design choise. Still, it makes a nice complication of the original question for interviews :)

这篇关于生成相同的唯一的哈希code所有字谜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆