最快的数据结构,以找到一个字符串的超弦? [英] fastest data structure to find the superstring of a string?

查看:300
本文介绍了最快的数据结构,以找到一个字符串的超弦?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的词汇量,并希望找到在包含给定字符串的词汇表的所有单词。这意味着我想找间所有词汇一个给定的字符串的所有超弦。但是树的数据结构,有利于包括()查询,并存在一些算法来找到子,但我找不到任何算法来解决这个问题。

I have a huge vocabulary and want to find all words in the vocabulary that contain the given string. that means I want to find all superstring of a given string among whole vocabulary. However Tree data structure is good for contains() query, and there exist some algorithm to find substrings, but I couldn't find any algorithm to solve this.

我想使用这个算法(或数据结构)在Java中。

I want to use this algorithm ( or data structure ) in Java.

推荐答案

您正在寻找一个后缀树

的想法是,给定一个字符串,该字符串的一些后缀每preFIX,是字符串的一个子(每串是一些后缀的preFIX)。

The idea is, given a string, every prefix of some suffix of this string, is a substring of that string (and every substring is a prefix of some suffix).

这意味着,你可以创建词的后缀树,这里的叶子每个后缀点到原来的字符串。

This means, you can create a suffix tree of the words, where the "leaf" of each suffix points to the original string.

现在,寻找一个子串,您需要按照输入字符串的节点遍历树,然后执行一些树的遍历(像的 DFS ),找到所有可达树叶。每个这样的叶片会出现一些字符串,它的查询字符串的子串的后缀。

Now, searching for a substring, you need to traverse the tree by following the nodes of the input substring, and then do some kind of tree traversal (like DFS), to find all reachable leaves. Each such leaf will be a suffix of some string, which the query string is a substring of.

此溶液是pretty的便宜,它是线性的查询的输出和尺寸的大小。

This solution is pretty cheap, it is linear in the size of the output and size of query.

这篇关于最快的数据结构,以找到一个字符串的超弦?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆