字谜 - 与链接和用C探测哈希 [英] Anagrams - Hashing with chaining and probing in C

查看：232 发布时间：2016/8/19 0:19:24 c hash anagram

本文介绍了字谜 - 与链接和用C探测哈希的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的标题编辑得到了，所以我想确保每个人都知道这是功课。现在的问题是仅仅优化方案，散列是我的主意。

My title got edited, so I wanted to make sure everyone knows this is homework. The problem is just to optimize the program, the hashing is my idea.

我正在优化C程序组合在一起的话是对方的字谜，然后打印出来。

I'm working on optimizing a C program that groups together words that are anagrams of each other, and then prints them out.

目前的程序是基本上链表的链接列表。在外部列表中的每个链接是一组互为字谜字

Currently the program is basically a linked list of linked lists. Each link in the outer list is a group of words that are anagrams of each other.

该方案的剖面显示，到目前为止，执行时间的最大部分是功能 wordLookup 。这是因为它具有以搜寻每个节点，并与可能100k的话从文件读取中，这可能需要一个很长的时间。举例来说，这里是 gprof的输出读取40K字：

The profile for the program shows that by far, the largest portion of execution time is the function wordLookup. This is because it has to search every node, and with a possible 100k words read in from a file, this can take a very long time. For instance, here is the gprof output for reading in 40k words:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
100.31      1.48     1.48    40000    37.12    37.12  wordLookup
  0.00      1.48     0.00    78235     0.00     0.00  newnode
  0.00      1.48     0.00    40000     0.00     0.00  sort_string
  0.00      1.48     0.00    38235     0.00     0.00  wordInsert
  0.00      1.48     0.00     1996     0.00     0.00  swap_words
  0.00      1.48     0.00     1765     0.00     0.00  wordAppend

我提出这个想法速度是数据结构更改为一个哈希表链彼此都字谜在同一插槽。

My idea for making this faster is to change the data structure to a hash table that chains all anagrams of each other in the same slot.

根据事情我的教授说，事情我已经读到这里，我为我的散列函数的思维是这样的。（注意：在素数分布成使得最常用的字母是低数和所用的至少是高数量）

Based on things my professor has said and things that I've read here, I'm thinking of something like this for my hash function. (Note: the prime numbers are distributed such that the most used letters are low numbers and the least used are high numbers.)

sort(string)

array alpha_primes = 5,71,37,29,2,53,59,19,11,83,79,31,43,13,7,67,97,23,17,3,41,73,47,89,61,101
hash(String) {
  hash = 1
  for (char in String) {
    hash *= alpha_primes[char-'a'];
  }
  return hash % tablesize
}

是否有此问题，将适当地分配值，每个组的字谜在表中一个独特的索引哈希表的大小？

如果这是不可能的，那么我应该：

If that is not possible, then should I:

链中的单词表一起（名单列表）

使用一个探测（线性或二次）解决方案

对于任何一种情况，当相比有哪些有利的一面/缺点？

字谜 - 与链接和用C探测哈希 [英] Anagrams - Hashing with chaining and probing in C

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

字谜 - 与链接和用C探测哈希 [英] Anagrams - Hashing with chaining and probing in C

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭