拼字游戏字取景器使用通配符 [英] Scrabble word finder with wildcards

查看:120
本文介绍了拼字游戏字取景器使用通配符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,似乎有些我之前有过类似的问题,但我一直没能找到我的一个可行的解决方案。



我M目前正在建设中使用C#,MySQL和HTML5和Javascript的移动Web应用程序。该应用程序将被用于帮助用户查找可能的单词,而玩游戏,如拼字游戏玩



我有这个问题:
我如何获得?从包含来自用户的字母输入字典的MySQL数据库话语权



更多细节:
- 用户可以输入任意数量的字母,还可以使用通配符(代表任何字母)。
- 如果用户输入TEST的结果不能包含超过1 C字和S和文字有超过2 T,里面有测试的结果会很糟糕。
- 结果不能包含多个字母输入比字



更​​新:似乎特里是解决我的问题由埃里克利珀<一的建议HREF =htt​​p://stackoverflow.com/questions/7418910/scrabble-word-finder-with-wildcards/7419003#7419003>这里。

问题是我是初学者用C#和MySQL的,所以这里有一些跟进的问题:




  1. 如何从我的MySQL字典创建特里? (400K +字)

  2. 我如何保存特里快速和未来的访问?

  3. 我如何访问特里和提取的话,从它C#?



非常感谢您的帮助!


解决方案

我如何从MySQL数据库中包含来自用户的字母输入字典中的话语权?




您没有。关系数据库表是不是一样有效,因为你需要解决这个问题的一个合适的数据结构。



你做什么,而不是为你打造的特里 数据结构出词典(或者,如果你真的迷,你建立一个 <一个HREF =htt​​p://en.wikipedia.org/wiki/Directed_acyclic_word_graph>耶 - 有向无环图字 - 这是一种压缩的特里结构的)



一旦你有一个线索/耶它变得非常便宜的针对给定的机架字典测试的每个的字,因为可以修剪出整个庞大的分支那架不可能匹配。



让我们看一个小例子字典。假设你有字典OP,OPS,OPT,OPTS,锅,盆,SOP,标准操作程序,停止,停从你建立这个线索:(以$节点是那些被标记为字可以在这里结束

  ^ ^根
/ | \
OPS
| | / \
$ P OOT
/ \ | | | $ b $(b T)$ S $ T $ P $Ø
| | | |
小号$ S $ S $ P $
|
US $

和你有机架OPS - 你该怎么办?



首先,你说我可以下去将O分支吗?是的,可以。所以,现在的问题是匹配的PS反对将O分支。你可以去在P支行吗?是的。是否有尾字的标记吗?是的,所以OP匹配。现在的问题是匹配的S对OP分支。你可以下去的T分支?号你可以去的分公司吗?是的。现在你有空架,你必须与之相匹配的反对老年退休金计划的分支。它有结束的字标记?是!所以,OPS也匹配。现在回溯到根。



你能走下来在P分支?是。现在的问题是,以针对在P分支匹配操作系统。下井PO分支和匹配的S - 失败。原路返回到根。



和再次,你怎么看这个去。最后,我们往下走的SOP分支,找到一个最终的字上SOP,所以SOP这架匹配。因为我们没有一个T.我们不下井ST分支



我们已经尝试字典中的每一个可能的话,发现OP,OPS和SOP全部匹配。但是,我们从来没有进行调查OPTS,POTS,停止或停止,因为我们没有一个T.



您看到这个数据结构如何使得它非常有效的?一旦你已经确定你不必在机架上的字母做出的开始的一个词,你没有调查的任何的字典,与开始的单词开始。如果你有订单但没有T,你不必调查瓦片或土豆或碳酸钾,或者波特拉奇或饮用; 。所有那些昂贵和无效搜索走开很快



适应的系统,以处理野瓷砖是非常简单的;如果有OPS?然后只需运行搜索算法26次,对OPSA,OPSB,OPSC ......这应该是足够快,这样做26次是便宜(或做26×26倍,如果你有两个空白。 )



这是基本的算法,专业的拼字游戏的AI程序中使用,但当然他们也不得不面对像板位置,机架管理的事情等等,这复杂化算法一些。这个简单的算法的版本将是足够快足以产生机架上所有可能的话。



不要忘了,当然你只需要计算线索/耶的一次的如果字典不随时间改变。它可以是费时打造线索出来的字典,所以你可能想这样做的一次的再想出一些方法来线索存储在表单中的磁盘,服从重建它迅速从磁盘。



您可以通过建立一个DAWG出线索的优化了内存使用情况。请注意如何有很多重复的,因为在英语中,很多单词的结束的一样,就像很多话的开始的相同。该线索做的开头,但在年底分享他们糟糕的工作节点共享一个伟大的工作。例如,您可以注意到模式无子女US $是非常普遍的,把线索为:

  ^根^ 
/ | \
○P US $ B $ C | | / \
$ P○○T
/ \ | | |
T $ | ŧ$ P $Ø
| \ | | |
\ | / p $
\ | / |
\ | /
\ | /
\ | /
\ | /
| /
|
US $



保存节点的一大堆。然后你可能会注意到,这两个词现在OP $ -S $结束,两个词结束T $ -S $,这样你就可以进一步压缩它:

  ^ ^根
/ | \
○P US $ B $ C | | / \
$ P \ØT
/ \ | \ |
| | \ |
| | Ø
| T $ |
\ | P $
\ | /
\ | /
| /
| /
US $

和现在我们有最小DAWG为这本词典。



延伸阅读:



http://dl.acm.org/citation.cfm?id=42420



http://archive.msdn.microsoft.com/dawg1



http://www.gtoal.com/wordgames/scrabble.html


I’ve got a problem and it seems some before me have had similar problems but I haven’t been able to find a working solution for me.

I’m currently building a mobile web application using C#, MySQL, HTML5 and Javascript. The application will be used to help users to find possible words to play while playing games like Scrabble.

The problem I’ve got: How do I get the right words from a MySQL database containing a dictionary from user letter input?

More details: - Users can input any number of letters and also use wildcards (representing any letter) . - If the user inputs "TEST" the result can’t contain words with more than 1 E and S and words with more than 2 T, a result with "TESTER" in it would be bad. - The result can’t contain words with more letters than inputted.

UPDATE: Seems Trie is the solution to my problem as suggested by Eric Lippert here.
Problem is I'm a beginner with both C# and MySQL, so here are some follow up questions:

  1. How do I create a Trie from my MySQL dictionary? (400k+ words)
  2. How do I store the Trie for quick and future access?
  3. How do I access the Trie and extract words from it with C#?

Thank you very much for the help!

解决方案

How do I get the right words from a MySQL database containing a dictionary from user letter input?

You don't. A relational database table is not a suitable data structure for solving this problem as efficiently as you need to.

What you do instead is you build a trie data structure out of the dictionary (or, if you're really buff, you build a dawg -- a directed acyclic word graph -- which is a sort of compressed trie.)

Once you have a trie/dawg it becomes very inexpensive to test every word in the dictionary against a given rack, because you can "prune out" whole huge branches of the dictionary that the rack cannot possibly match.

Let's look at a small example. Suppose you have the dictionary "OP, OPS, OPT, OPTS, POT, POTS, SOP, SOPS, STOP, STOPS" From that you build this trie: (Nodes with a $ are those that are marked as "word can end here".

           ^root^
           /  |  \
         O    P    S
         |    |   / \
         P$   O  O   T   
        / \   |  |   |
       T$  S$ T$ P$  O
       |      |  |   |
       S$     S$ S$  P$
                     |
                     S$

and you have the rack "OPS" -- what do you do?

First you say "can I go down the O branch?" Yes, you can. So now the problem is matching "PS" against the O branch. Can you go down the P subbranch? Yes. Does it have an end-of-word marker? Yes, so OP is a match. Now the problem is matching "S" against the OP branch. Can you go down the T branch? No. Can you go down the S branch? Yes. Now you have the empty rack and you have to match it against the OPS branch. Does it have an end-of-word marker? Yes! So OPS matches also. Now backtrack up to the root.

Can you go down the P branch? Yes. Now the problem is to match OS against the P branch. Go down the PO branch and match S -- that fails. Backtrack to the root.

And again, you see how this goes. Eventually we go down the SOP branch and find an end-of-word on SOP, so "SOP" matches this rack. We don't go down the ST branch because we don't have a T.

We've tried every possible word in the dictionary and discovered that OP, OPS and SOP all match. But we never had to investigate OPTS, POTS, STOP or STOPS because we didn't have a T.

You see how this data structure makes it very efficient? Once you have determined that you do not have the letters on the rack to make the beginning of a word, you don't have to investigate any dictionary words that start with that beginning. If you have PO but no T, you don't have to investigate POTSHERD or POTATO or POTASH or POTLATCH or POTABLE; all those expensive and fruitless searches go away very quickly.

Adapting the system to deal with "wild" tiles is pretty straightforward; if you have OPS?, then just run the search algorithm 26 times, on OPSA, OPSB, OPSC... It should be fast enough that doing it 26 times is cheap (or doing it 26 x 26 times if you have two blanks.)

This is the basic algorithm that professional Scrabble AI programs use, though of course they also have to deal with things like board position, rack management and so on, which complicate the algorithms somewhat. This simple version of the algorithm will be plenty fast enough to generate all the possible words on a rack.

Don't forget that of course you only have to compute the trie/dawg once if the dictionary is not changing over time. It can be time-consuming to build the trie out of the dictionary, so you might want to do so once and then figure out some way to store the trie on disk in a form that is amenable to rebuilding it quickly from disk.

You can optimize the memory usage by building a DAWG out of the trie. Notice how there is a lot of repetition because in English, lots of words end the same, just as lots of words begin the same. The trie does a great job of sharing nodes at the beginning but a lousy job of sharing them at the end. You can notice for example that the "S$ with no children" pattern is extremely common, and turn the trie into:

           ^root^
          / |  \
        O   P    S
        |   |   / \
        P$  O  O   T   
       /  \ |  |   |
      T$  | T$ P$  O
      |    \ | |   |
       \    \| /   P$
        \    |/    |
         \   |    /
          \  |   /  
           \ |  /
            \| /  
             |/
             |       
             S$

Saving a whole pile of nodes. And then you might notice that two words now end in O-P$-S$, and two words end in T$-S$, so you can compress it further to:

           ^root^
           / | \
          O  P  S
          |  | / \
          P$ O \  T   
         /  \|  \ |
         |   |   \|
         |   |    O
         |   T$   |
          \  |    P$
           \ |   /
            \|  /  
             | /
             |/   
             S$

And now we have the minimal DAWG for this dictionary.

Further reading:

http://dl.acm.org/citation.cfm?id=42420

http://archive.msdn.microsoft.com/dawg1

http://www.gtoal.com/wordgames/scrabble.html

这篇关于拼字游戏字取景器使用通配符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆