找不到存储在trie中的模式中出现的单词 [英] Finding no of occurrence of word in pattern stored in trie

查看:75
本文介绍了找不到存储在trie中的模式中出现的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我们有一个存储在Trie中的句子并试图在其中找到一个模式。假设我将文本存储为数组。如果我在那句话中重复了一遍,该怎么办?什么是正确的存储方式。

我应该在每个叶子节点上保持频率存储吗?

什么是在文本中找不到单词出现的最佳方法? div class =h2_lin>解决方案

请看我对这个问题的评论。



一个性能良好的解决方案可以通过哈希表: http://en.wikipedia.org/wiki/Hash_table [ ^ ]。



这种结构可以保存数据键值对,支持键的唯一性,并允许通过键快速查找值。搜索的计算时间复杂度是O(1),并且开始一些相当大量的数据是非常有益的。 (请参阅 http://en.wikipedia.org/wiki/Big_O_notation [ ^ ]。)



你需要具有字符串类型的键,用于表示单词,值应表示到目前为止找到的单词数(使用指向数字的指针)。您逐个添加单词(首先,您需要将字符串拆分为单词,另一个要解决的问题),并尝试添加每个单词。如果某个键找到了某个值,则按指针递增字数。如果未找到,请添加值为1的新键值对(指向值对象的指针)。最后,您将拥有每个仅代表一次的所有单词,以及每个单词的出现次数。不要忘记释放所有内存。



也许,具有可接受时间复杂度的简单结构(O(log(n))将是二叉树: http://en.wikipedia.org/wiki/Binary_tree [ ^ ]。



使用二叉树的想法很漂亮大致相同。



使用C语言,这将是一项很好的工作量,但它会让您获得难忘和有用的体验,尤其是在耐心方面。 - )

如果你发现树或哈希表的现有C实现,你可以自己帮忙。这应该很容易。



好好运,

-SA


If we have a sentence which is stored in Trie and trying to find a pattern in that. Assuming i am storing text as an array. What if i have repeated words in that sentence. What is correct way of storing it.
Should i keep storing frequency at each leaf node?
What is best way of finding no of occurrence of word in a text?

解决方案

Please see my comment to the question.

A solution with a good performance is possible via a hash table: http://en.wikipedia.org/wiki/Hash_table[^].

This structure keeps data in key-value pairs, supports uniqueness of keys, and allows to quickly find a value by a key. The computational time complexity of search is O(1), and is highly beneficial starting some considerable volume of data. (Please see http://en.wikipedia.org/wiki/Big_O_notation[^].)

You would need to have keys of the string type, to represent a word, and a value should represent a number of words found so far (use pointer to a number). You add words one-by one (first, you would need to split your string into words, another problem to solve), and try to add each word. If a value is found by some key, increment the word count by pointer. If not found, add a new key-value pair with value of 1 (a pointer to a value object). At the end, you will have all words each represented only once, and number of occurrences for each word. Don''t forget to deallocate all the memory.

Perhaps, much simple structure with acceptable time complexity (O(log(n)) would be a binary tree: http://en.wikipedia.org/wiki/Binary_tree[^].

The idea of using binary tree is pretty much the same.

With C, it''s going to be a good volume of work, but it promises you unforgettable and useful experience, especially in patience. :-)
You can help yourself if you find existing C implementation of a tree or a hash table, which should be easy.

Good luck,
—SA


这篇关于找不到存储在trie中的模式中出现的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆