拼字游戏字发现者:建设一个线索,存储特里,用特里? [英] Scrabble word finder: building a trie, storing a trie, using a trie?

查看:221
本文介绍了拼字游戏字发现者:建设一个线索,存储特里,用特里?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图做的事:




  • 构建一个移动Web应用程序,用户可以得到帮助找到的话玩拼字游戏

  • 时播放用户通过在字母和0个或多个通配符任何金额键入带话建议



如何,我想这样做:




  • 使用MySQL数据库包含超过40万字

  • 使用ASP.NET与C#作为服务器端编程语言

  • 字典使用HTML5,CSS和Javascript



我目前的计划:




  • 构建特里与所有从数据库中的话,所以我可以根据用户的信/通配符输入



做的话快速准确搜索

有一个计划是没有好,如果你不能执行它,这就是我需要帮助:




  • 我如何从数据库中建立一个特里? (更新:我想在我的数据库已经在使用的话就产生一个特里,这样做了之后,我不打算使用的数据库进行文字匹配了)

  • 我如何保存在特里快速和容易的访问? (更新:所以,我可能会破坏我的数据库)

  • 我如何使用C#来搜索依赖于字母和通配符



  • 最后:

    任何帮助是非常赞赏,我还是用C#和MySQL是初学者所以请温柔



    感谢您了很多!


    解决方案

    首先,让我们来看看在对这个问题的限制。要存储单词列表中能够有效地支持字谜问题的一个数据结构的游戏。即,给定一个的n个字母,什么都在单词列表中的正 - 或 - 更少字母的单词,可以从该齿条进行机架。单词列表将有约40万字,解压缩后等等大概是字符串数据约一至十兆。



    一个线索就是用来解决这个经典的数据结构问题,因为它结合了搜索效率两个内存效率。约400K合理长度的单词一个单词列表,你应该能够保持该线索在内存中。 (相对于用B树之类的解决方案,其中您保留大部分树的磁盘上,因为它太大,不适合在内存中一次全部去。)



    字典树基本上是没有什么比一个26进制树(假设你使用罗马字母),每一个节点都有,说无论是词的结尾,每个节点上的字母和一个额外的一点。



    让我们勾画的数据结构:

     类TrieNode 
    {
    焦炭书;
    布尔IsEndOfWord;
    名单,LT; TrieNode>儿童;
    }

    这当然只是一个草图;你可能会想使这些有适当的属性访问器和建设者和诸如此类的东西。此外,也许一扁平列表不是最佳的数据结构;也许某种字典比较好。我的建议是要得到它的第一份工作,然后测量其性能,如果是不可接受的,然后进行修改,以提高其性能试验。



    您可以启动空特里:

      TrieNode根=新TrieNode(^,假的,新的List< TrieNode>()); 



    即,这是代表一个单词的开始的根字典树节点。



    你怎么加上AA,在拼字游戏字典中的第一个字?嗯,首先做出的第一个字母一个节点:

      root.Children.Add('A',假的,新的List< TrieNode>()); 



    OK,我们的特里现在是



      ^ 
    |

    现在添加一个节点,第二个字母:

      root.Children [0] .Children.Add(新trieNode('A',真正的,新的List< TrieNode>())); 

    我们的特里现在是



      ^ 
    |

    |
    $一个 - 我们notate字标志的一端与$



    大。现在假设我们要添加AB。我们已经有A的节点,所以加给它的B $节点:

      root.Children [0 ] .Children.Add(新trieNode('b',真正的,新的List< TrieNode>()); 

    现在我们

      ^ 
    |

    / \
    比率$ b $

    请于会这样。当然,而不是写根。孩子[0] ...你会写一个循环,搜索线索,看看您是否希望节点存在,如果没有,则创建它。



    存储您的线索在磁盘上 - 坦率地说,我只想存储单词列表作为一个纯文本文件,并重建线索,当你需要它不应该超过30秒左右,然后就可以重新使用。该线索在内存中。如果你要存储在某些格式更像是一个线索线索,它不应该是很难拿出一个序列化格式。



    要查找线索匹配机架,这个想法是探索线索的每一个部分,但修剪了所在区域的架子没可能的匹配。如果你还没有在机架上的AS,没有必要走下来的A节点。我勾勒出在你前面的问题的搜索算法。



    我有一个功能,作风顽强的特里我一直在博客上写下的的实现一段时间,但从来没有得到解决它。如果我最终张贴我会更新这个问题。


    What I’m trying to do:

    • Build a mobile web application where the user can get help finding words to play when playing scrabble
    • Users get word suggestions by typing in any amount of letters and 0 or more wildcards

    How I’m trying to do this:

    • Using MySQL database with a dictionary containing over 400k words
    • Using ASP.NET with C# as server-side programming language
    • Using HTML5, CSS and Javascript

    My current plan:

    • Building a Trie with all the words from the database so I can do a fast and accurate search for words depending on user letter/wildcard input

    Having a plan is no good if you can’t execute it, this is what I need help with:

    • How do I build a Trie from the database? (UPDATE: I want to generate a Trie using the words already in my database, after that's done I'm not going to use the database for word matching any more)
    • How do I store the Trie for fast and easy access? (UPDATE: So I can trash my database)
    • How do I use C# to search for words using the Trie depending on letters and wildcards?

    Finally:
    Any help is very much appreciated, I’m still a beginner with C# and MySQL so please be gentle

    Thank you a lot!

    解决方案

    First off, let's look at the constraints on the problem. You want to store a word list for a game in a data structure that efficiently supports the "anagram" problem. That is, given a "rack" of n letters, what are all the n-or-fewer-letter words in the word list that can be made from that rack. the word list will be about 400K words, and so is probably about one to ten megs of string data when uncompressed.

    A trie is the classic data structure used to solve this problem because it combines both memory efficiency with search efficiency. With a word list of about 400K words of reasonable length you should be able to keep the trie in memory. (As opposed to going with a b-tree sort of solution where you keep most of the tree on disk because it is too big to fit in memory all at once.)

    A trie is basically nothing more than a 26-ary tree (assuming you're using the Roman alphabet) where every node has a letter and one additional bit on each node that says whether it is the end of the word.

    So let's sketch the data structure:

    class TrieNode
    {
        char Letter;
        bool IsEndOfWord;
        List<TrieNode> children; 
    }
    

    This of course is just a sketch; you'd probably want to make these have proper property accessors and constructors and whatnot. Also, maybe a flat list is not the best data structure; maybe some sort of dictionary is better. My advice is to get it working first, and then measure its performance, and if it is unacceptable, then experiment with making changes to improve its performance.

    You can start with an empty trie:

    TrieNode root = new TrieNode('^', false, new List<TrieNode>());
    

    That is, this is the "root" trie node that represents the beginning of a word.

    How do you add the word "AA", the first word in the Scrabble dictionary? Well, first make a node for the first letter:

    root.Children.Add('A', false, new List<TrieNode>());
    

    OK, our trie is now

    ^
    |
    A
    

    Now add a node for the second letter:

    root.Children[0].Children.Add(new trieNode('A', true, new List<TrieNode>()));
    

    Our trie is now

    ^
    |
    A
    |
    A$   -- we notate the end of word flag with $
    

    Great. Now suppose we want to add AB. We already have a node for "A", so add to it the "B$" node:

    root.Children[0].Children.Add(new trieNode('B', true, new List<TrieNode>());
    

    and now we have

        ^
        |
        A
       / \
      A$   B$
    

    Keep on going like that. Of course, rather than writing "root.Children[0]..." you'll write a loop that searches the trie to see if the node you want exists, and if not, create it.

    To store your trie on disk -- frankly, I would just store the word list as a plain text file and rebuild the trie when you need to. It shouldn't take more than 30 seconds or so, and then you can re-use the trie in memory. If you do want to store the trie in some format that is more like a trie, it shouldn't be hard to come up with a serialization format.

    To search the trie for matching a rack, the idea is to explore every part of the trie, but to prune out the areas where the rack cannot possibly match. If you haven't got any "A"s on the rack, there is no need to go down any "A" node. I sketched out the search algorithm in your previous question.

    I've got an implementation of a functional-style persistent trie that I've been meaning to blog about for a while but never got around to it. If I do eventually post that I'll update this question.

    这篇关于拼字游戏字发现者:建设一个线索,存储特里,用特里?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆