高效的单词打乱算法 [英] Efficient word scramble algorithm

查看:91
本文介绍了高效的单词打乱算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种有效的算法,用于将一组字母加扰成包含最大单词数的排列。

I'm looking for an efficient algorithm for scrambling a set of letters into a permutation containing the maximum number of words.

例如,假设我得到了字母列表:{e,e,h,r,s,t}。我需要对它们进行排序,以使其包含最大数量的单词。如果我将这些字母排序为 theres,则其中包含 the, there, her, here和 ere。因此该示例的得分为5,因为它包含5个单词。我想对字母进行排序,使其得分最高(包含最多单词)。

For example, say I am given the list of letters: {e, e, h, r, s, t}. I need to order them in such a way as to contain the maximum number of words. If I order those letters into "theres", it contain the words "the", "there", "her", "here", and "ere". So that example could have a score of 5, since it contains 5 words. I want to order the letters in such a way as to have the highest score (contain the most words).

一种幼稚的算法是尝试对每个排列进行评分。我相信这是O(n!),因此仅对上面的6个字母尝试720种不同的排列(包括一些重复项,因为该示例具有两次e)。当然,要获取更多字母,幼稚的解决方案很快就变得不可能。

A naive algorithm would be to try and score every permutation. I believe this is O(n!), so 720 different permutations would be tried for just the 6 letters above (including some duplicates, since the example has e twice). For more letters, the naive solution quickly becomes impossible, of course.

该算法不一定要产生最佳解决方案,但它应该找到一个好的解决方案在合理的时间内。对于我的应用程序,仅猜测( Monte Carlo )的排列效果很差,因此

The algorithm doesn't have to actually produce the very best solution, but it should find a good solution in a reasonable amount of time. For my application, simply guessing (Monte Carlo) at a few million permutations works quite poorly, so that's currently the mark to beat.

我当前正在使用 Aho-Corasick 算法可对排列进行评分。它只需要搜索文本中的一个单词即可搜索字典中的每个单词,因此我认为它非常有效。这也意味着我将所有单词存储在 trie 中,但是如果另一种算法需要不同的存储也可以我并不担心要设置字典,而只是担心实际排序和搜索的运行时间。如果需要,甚至可以使用模糊字典,例如布鲁姆过滤器

I am currently using the Aho-Corasick algorithm to score permutations. It searches for each word in the dictionary in just one pass through the text, so I believe it's quite efficient. This also means I have all the words stored in a trie, but if another algorithm requires different storage that's fine too. I am not worried about setting up the dictionary, just the run time of the actual ordering and searching. Even a fuzzy dictionary could be used if needed, like a Bloom Filter.

对于我的应用程序,给定的字母列表大约为100,并且词典包含超过100,000个条目。字典永远不会改变,但是需要订购几个不同的字母列表。

For my application, the list of letters given is about 100, and the dictionary contains over 100,000 entries. The dictionary never changes, but several different lists of letters need to be ordered.

我正在考虑尝试使用路径查找算法。我相信我可以从列表中的随机字母开始。然后,每个剩余的字母将用于创建路径。我认为这可以与Aho-Corasick评分算法配合使用,因为可以一次累积一个字母的分数。我还没有尝试过寻路;也许这不是一个好主意?我不知道哪种路径查找算法可能是最好的。

I am considering trying a path finding algorithm. I believe I could start with a random letter from the list as a starting point. Then each remaining letter would be used to create a "path." I think this would work well with the Aho-Corasick scoring algorithm, since scores could be built up one letter at a time. I haven't tried path finding yet though; maybe it's not a even a good idea? I don't know which path finding algorithm might be best.

我想到的另一个算法也以随机字母开头。然后将在字典trie中搜索包含剩余字母的丰富分支。包含不可用字母的字典分支将被删除。我对如何确切地使用它的细节有些迷惑,但是它可以完全消除得分排列。

Another algorithm I thought of also starts with a random letter. Then the dictionary trie would be searched for "rich" branches containing the remain letters. Dictionary branches containing unavailable letters would be pruned. I'm a bit foggy on the details of how this would work exactly, but it could completely eliminate scoring permutations.

推荐答案

您可以尝试模拟退火,该方法已成功用于许多领域中的复杂优化问题。基本上,您会进行随机爬山,同时逐渐降低随机性。由于您已经获得Aho-Corasick评分,因此您已经完成了大部分工作。您所需要的只是一种生成邻居排列的方法。因为像交换一对字母这样的简单操作应该可以正常工作。

You might try simulated annealing, which has been used successfully for complex optimization problems in a number of domains. Basically you do randomized hill-climbing while gradually reducing the randomness. Since you already have the Aho-Corasick scoring you've done most of the work already. All you need is a way to generate neighbor permutations; for that something simple like swapping a pair of letters should work fine.

这篇关于高效的单词打乱算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆