实现字典的最佳数据结构? [英] Best data structure for implementing a dictionary?

查看:212
本文介绍了实现字典的最佳数据结构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是最好的数据结构存储字典的所有单词?我可以想到的最好的方法是使用一个 HashMap ,它将映射到一个 HashTable 。基本上,根据第一个字符,我们将得到相关的 HashTable ,然后使用这个,我们可以添加从该字符开始的单词。然后,我们将根据字符串选择一个好的散列函数。



有更好的方法吗?

解决方案

根据你想做什么,有很多好的数据结构。



如果你只想存储这些单词, 这是这个词吗?,没有其他花哨的机器的标准散列表是一个合理的方法。如果该字词是预先固定的列表,请考虑使用完美的哈希表,以获得出色的性能和空间



如果您希望能够在支持快速查找时检查给定的前缀是否存在,则 trie 是一个很好的选择,虽然它可能有点空间无效。它还支持快速插入或删除。它也允许以字母顺序进行迭代,这种哈希不提供。这基本上是你在答案中描述的结构,但是根据用例,尝试的其他表示可能会更好。



如果除了上述之外,你知道单词列表是固定的,考虑使用 DAWG (有针对性的非循环字图) ,这本质上是该语言的最小状态DFA。它比真相更紧凑,但支持许多相同的操作。



如果你想要类似于Trie的行为,但不想支付巨额的空间罚款, 三元搜索树是另一个可行的选择,基数树。这些是非常不同的结构,但是可以在不同的情况下比trie更好。



如果空间是一个问题,但你想要一个特技,请查看简洁的特里表示,其查找速度较慢,但​​理论上最优的空间使用。该链接讨论了如何在JavaScript中使用它作为传输大量数据的简单方法。另外一个紧凑的代表是双阵列特洛伊,虽然我承认我知道



如果要使用字典进行拼写检查,需要查找与其他单词相似的单词, BK-tree 是一个很好的数据结构。




What would be the best data structure to store all the words of a dictionary? The best I could think of was to use a HashMap, which will map to a HashTable. Basically, depending upon the first character, we will get the associated HashTable and then using this, we can add the words starting from that character. We'll then pick a good hash function based on the string.

Is there a better approach?

解决方案

Depending on what you want to do, there are many good data structures.

If you just want to store the words and ask "is this word here or not?", a standard hash table with no other fancy machinery is a reasonable approach. If that word is list fixed in advance, consider using a perfect hash table to get excellent performance and space usage.

If you want to be able to check if a given prefix exists while supporting fast lookups, a trie is a good option, though it can be a bit space-inefficient. It also supports fast insertions or deletions. It also allows for iteration in alphabetical order, which hashing doesn't offer. This is essentially the structure you've described in your answer, but depending on the use case other representations of tries might be better.

If in addition to the above, you know for a fact that the word list is fixed, consider using a DAWG (directed acyclic word graph), which is essentially a minimum-state DFA for the language. It's substantially more compact than the trie, but supports many of the same operations.

If you want trie-like behavior but don't want to pay a huge space penalty, the ternary search tree is another viable option, as is the radix tree. These are very different structures, but can be much better than the trie in different circumstances.

If space is a concern but you want a trie, look into the succinct trie representation, which has slower lookups but just about theoretically optimal space usage. The link discusses how it's being used in JavaScript as an easy way to transmit a huge amount of data. An alternative compact representation is the double-array trie, though admittedly I know very little about it.

If you want to use the dictionary for operations like spell-checking where you need to find words similar to other words, the BK-tree is an excellent data structure to consider.

Hope this helps!

这篇关于实现字典的最佳数据结构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆