最佳的数据结构字典执行 [英] Best data structure for dictionary implementation

查看:182
本文介绍了最佳的数据结构字典执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道,什么是最好的数据结构来存储字典的所有单词。尽我所能想到的是这样的:

I was wondering, what would be the best data structure to store all the words of a dictionary. The Best i could think of was this:

一个HashMap的,这将映射到一个哈希表。基本上取决于第一个字符,我们会得到相关的Hashtable和再利用这一点,我们可以添加从该字符开始的字。散列函数可以巧妙基于字符串。

A HashMap which will map to a HashTable. Basically depending upon the first character, we will get the associated hashTable and then using this, we can add the words starting from that character. Hashing function can be smartly based on the string.

还能有什么更好的办法?

Can there be any better approach?

推荐答案

根据你想做的事,有很多很好的数据结构。

Depending on what you want to do, there are many good data structures.

如果你只是想保存的话,并问:是这个词在这里没有?,没有其他花哨的机械标准的哈希表是一种合理的方法。如果字是表事先固定的,可以考虑使用完美的哈希表,以获得卓越的性能和空间使用情况。

If you just want to store the words and ask "is this word here or not?", a standard hash table with no other fancy machinery is a reasonable approach. If that word is list fixed in advance, consider using a perfect hash table to get excellent performance and space usage.

如果您希望能够检查一个给定的preFIX存在,同时支持快速查找,一个线索是一个不错的选择,虽然它可以是一个有点空间效率不高。它还支持快速插入或缺失。它还允许重复按字母顺序排列,而散列不提供。这基本上是你在你的答案所描述的结构,但根据不同的使用情况下,其他的再$ P $的尝试psentations可能会更好。

If you want to be able to check if a given prefix exists while supporting fast lookups, a trie is a good option, though it can be a bit space-inefficient. It also supports fast insertions or deletions. It also allows for iteration in alphabetical order, which hashing doesn't offer. This is essentially the structure you've described in your answer, but depending on the use case other representations of tries might be better.

如果除以上,你知道一个事实,这个词表是固定的,可以考虑使用 DAWG (有向无环词图),它实质上是一个最小状态的DFA的语言。它基本上大于特里结构更紧凑,而且支持许多相同的操作的

If in addition to the above, you know for a fact that the word list is fixed, consider using a DAWG (directed acyclic word graph), which is essentially a minimum-state DFA for the language. It's substantially more compact than the trie, but supports many of the same operations.

如果你想特里般的行为,但不想付出巨大的空间点球,三元搜索树是另一种可行的选择,因为是基数树。这些是非常不同的结构,但也可以是比在不同的情况下的线索好得多。

If you want trie-like behavior but don't want to pay a huge space penalty, the ternary search tree is another viable option, as is the radix tree. These are very different structures, but can be much better than the trie in different circumstances.

如果空间是一个问题,但你想有一个线索,寻找到简洁的线索再presentation,其中有慢查找,但只是理论上的最佳空间使用情况。链接讨论它如何被使用的JavaScript作为一种简单的方法来发送大量的数据。另一种紧凑型重presentation是双数组trie ,但无可否认我知道的很少一下吧。

If space is a concern but you want a trie, look into the succinct trie representation, which has slower lookups but just about theoretically optimal space usage. The link discusses how it's being used in JavaScript as an easy way to transmit a huge amount of data. An alternative compact representation is the double-array trie, though admittedly I know very little about it.

如果您想使用字典的操作,如拼写检查,你需要找到类似换句话说, BK树是一个很好的数据结构来考虑。

If you want to use the dictionary for operations like spell-checking where you need to find words similar to other words, the BK-tree is an excellent data structure to consider.

希望这有助于!

这篇关于最佳的数据结构字典执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆