查找包含数组中所有单词的字符串子字符串 [英] Finding Sub-Strings of String Containing all the words in array
问题描述
我有一个字符串和一个单词数组,我必须编写代码来查找包含该数组中所有单词的字符串的所有子字符串,该子字符串的顺序是任意的.该字符串不包含任何特殊字符/数字,并且每个单词都用空格分隔.
I have a String and an array of words and I have to write code to find all substrings of the string that contain all the words in the array in any order. The string does not contain any special characters / digits and each word is separated by a space.
例如:
给出的字符串:
aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc
数组中的单词:
aaaa
bbbb
cccc
输出示例:
aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb
aaaa aaaa aaaa aaaa cccc bbbb
aaaa cccc bbbb bbbb bbbb bbbb
cccc bbbb bbbb bbbb bbbb aaaa
aaaa cccc bbbb
我已经使用for循环实现了此功能,但这效率很低.
I have implemented this using for loops, but this is very inefficient.
我如何才能更有效地做到这一点?
How can I do this more efficiently?
我的代码:
for(int i=0;i<str_arr.length;i++)
{
if( (str_arr.length - i) >= words.length)
{
String res = check(i);
if(!res.equals(""))
{
System.out.println(res);
System.out.println("");
}
reset_all();
}
else
{
break;
}
}
public static String check(int i)
{
String res = "";
num_words = 0;
for(int j=i;j<str_arr.length;j++)
{
if(has_word(str_arr[j]))
{
t.put(str_arr[j].toLowerCase(), 1);
h.put(str_arr[j].toLowerCase(), 1);
res = res + str_arr[j]; //+ " ";
if(all_complete())
{
return res;
}
res = res + " ";
}
else
{
res = res + str_arr[j] + " ";
}
}
res = "";
return res;
}
推荐答案
我的第一种方法将类似于以下伪代码
My first approach would be something like the following pseudo-code
for word:string {
if word in array {
for each stored potential substring {
if word wasnt already found {
remove word from notAlreadyFoundList
if notAlreadyFoundList is empty {
use starting pos and ending pos to save our substring
}
}
store position and array-word as potential substring
}
这应该具有不错的性能,因为您只需要遍历字符串一次.
This should have decent performance since you only traverse the string once.
这是我的伪代码的实现,请尝试一下,看看它的性能是好是坏.它假定在找到最后一个单词后立即找到匹配的子字符串.如果您确实希望所有匹配,请更改标记为//ALLMATCHES
的行:
This is an implementation of my pseudo-code, try it out and see if it performs better or worse. It works under the assumption that a matching substring is found as soon as you find the last word. If you truly want all matches, change the lines marked //ALLMATCHES
:
class SubStringFinder {
String textString = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
Set<String> words = new HashSet<String>(Arrays.asList("aaaa", "bbbb", "cccc"));
public static void main(String[] args) {
new SubStringFinder();
}
public SubStringFinder() {
List<PotentialMatch> matches = new ArrayList<PotentialMatch>();
for (String textPart : textString.split(" ")) {
if (words.contains(textPart)) {
for (Iterator<PotentialMatch> matchIterator = matches.iterator(); matchIterator.hasNext();) {
PotentialMatch match = matchIterator.next();
String result = match.tryMatch(textPart);
if (result != null) {
System.out.println("Match found: \"" + result + "\"");
matchIterator.remove(); //ALLMATCHES - remove this line
}
}
Set<String> unfound = new HashSet<String>(words);
unfound.remove(textPart);
matches.add(new PotentialMatch(unfound, textPart));
}// ALLMATCHES add these lines
// else {
// matches.add(new PotentialMatch(new HashSet<String>(words), textPart));
// }
}
}
class PotentialMatch {
Set<String> unfoundWords;
StringBuilder stringPart;
public PotentialMatch(Set<String> unfoundWords, String part) {
this.unfoundWords = unfoundWords;
this.stringPart = new StringBuilder(part);
}
public String tryMatch(String part) {
this.stringPart.append(' ').append(part);
unfoundWords.remove(part);
if (unfoundWords.isEmpty()) {
return this.stringPart.toString();
}
return null;
}
}
}
这篇关于查找包含数组中所有单词的字符串子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!