填字游戏搜索的最佳数据结构 [英] Best data structure for crossword puzzle search

查看:25
本文介绍了填字游戏搜索的最佳数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用于解决填字游戏的大型数据库,其中包含一个单词和一个描述.我的应用程序允许搜索特定长度的单词和特定位置的字符(这是一种艰难的方式......遍历所有单词并检查每个单词).加上按描述搜索(如有必要)

I have a large database for solving crossword puzzles, consisting of a word and a description. My application allows searching for words of a specific length and characters on specific positions (this is done the hard way ... go through all words and check each). Plus a search by description (if necessary)

例如找到单词 _ _ A _ _ B(6 个字母的单词,第三个字符 A 和最后一个 B)

For instance find word _ _ A _ _ B (6 letter word, third character A and last B)

我想以搜索速度非常快的方式对单词进行索引.我的第一个想法是使用平衡的树结构,还有其他建议吗?

I would like to index the words in such way that the searching would be really fast. My first idea was to use a balanced tree structure, any other suggestion?

推荐答案

好吧,我要提出一些奇怪的东西,但是来自 C++ 我一直在使用 Boost很长一段时间,我来看看MultiIndex 库.

Okay, I am going to propose something weird, but coming from C++ I have been using Boost for a long time and I have come to see the MultiIndex library.

这个库的想法是创建一个集合,但有许多不同的查询方式.事实上,它可以建模一个数据库.

The idea of this library is to create one collection, but have many different ways to query it. It could model, in fact, a database.

所以,让我们把我们的话放在一个表格中,并把必要的索引放在适当的位置:

So, let's put our words in a table, and put the necessary indexes in place:

word                     |length|c0|c1|c2| ... |c26|
-------------------------|------|--|--|--| ... |---|
Singapour                |9     |S |i |n | ... |0  |

现在查询将如下所示:

Select word From table Where length=9 And c2='n' And c8='u';

很简单不是吗?

为了获得最大效率,表应该按长度进行分区,索引(每个 cX 列一个)应该是分区的本地索引.

For maximum efficiency, the table should be partitioned on the length, and the indexes (one per cX column) should be local to the partition.

对于内存中的解决方案,每个长度都有一个容器,包含与长度一样多的索引,每个索引都是一个指向排序列表的哈希表(更容易合并)

For an in-memory solution you would have one container per length, containing as many indexes as the length, each index being a hash table pointing to a sorted list (easier merge)

这是一个python描述:

Here is a python description:

class Dictionary:
  def __init__(self, length):
    self.length = length
    self.words = set([])
    self.indexes = collections.defaultdict(set)

  def add(self, word):
    if len(word) != self.length:
      raise RuntimeException(word + ' is not ' + `self.length` + ' characters long')

    if word in self.words:
      raise RuntimeException(word + ' is already in the dictionary')

    self.words.add(word)

    for i in range(0,length):
      self.indexes[(i,word[i])].add(word)

  def search(self, list):
    """list: list of tuples (position,character)
    """
    def compare(lhs,rhs): return cmp(len(lhs),len(rhs))

    sets = [self.indexes[elem] for elem in list]
    sets.sort(compare)
    return reduce(intersection, sets)

我自愿提供了 length 参数,以最小化散列的大小,从而使搜索更好.此外,集合按长度排序,以便更好地计算交集:)

I have voluntarily provided the length argument, to minimize the size of the hashes and thus make the search better. Also, the sets are sorted by length so that the computation of the intersection be better :)

如果您愿意,请继续针对其他解决方案进行测试:)

Go ahead and test it against other solutions if you wish :)

这篇关于填字游戏搜索的最佳数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆