使用搜索字符串。通配符 [英] Searching strings with . wildcard

查看:104
本文介绍了使用搜索字符串。通配符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含太多字符串的数组,想要在其上搜索模式。
此模式可以有一些。匹配(每个)1个字符(任意)的通配符。

I have an array with so much strings and want to search for a pattern on it. This pattern can have some "." wildcard who matches (each) 1 character (any).

例如:

myset = {"bar", "foo", "cya", "test"}

find(myset, "f.o") -> returns true (matches with "foo") 
find(myset, "foo.") -> returns false 
find(myset, ".e.t") -> returns true (matches with "test")
find(myset, "cya") -> returns true (matches with "cya")

我试图找到一种快速实现此算法的方法,因为 myset 实际上是一个很大的数组,但是我的想法都没有令人满意的复杂性(例如 O(size_of(myset)* lenght(pattern)))

I tried to find a way to implement this algorithm fast because myset actually is a very big array, but none of my ideas has satisfactory complexity (for example O(size_of(myset) * lenght(pattern)))

编辑:

myset 是一个巨大的数组,其中的单词并不大。
我可以做一个缓慢的预处理。但是我会有很多 find()查询,所以 find()我要 find()尽可能快。

myset is an huge array, the words in it aren't big. I can do a slow preprocessing. But I'll have so much find() queries, so find() I want find() to be as fast as possible.

推荐答案

您可以构建集合中所有可能单词的语料库的后缀树(查看此链接
使用此数据结构,您的复杂性将包括O(n)花费一倍的时间来构建树,其中n是您所有单词的长度之和。

You could build a suffix tree of the corpus of all possible words in your set (see this link) Using this data structure your complexity would include a one time cost of O(n) to build the tree, where n is the sum of the lengths of all your words.

建立树后,查找字符串是否匹配应仅取O(n),其中n为长度的字符串。

Once the tree is built finding if a string matches should take just O(n) where n is length of the string.

这篇关于使用搜索字符串。通配符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆