如何找到一个给定单词的所有排列在一个给定的文本? [英] How to find all permutations of a given word in a given text?

查看:106
本文介绍了如何找到一个给定单词的所有排列在一个给定的文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个面试问题(手机屏幕):写一个函数(Java)中发现,出现在一个给定的文本给定单词的所有排列。例如,对于字农行和文字 abcxyaxbcayxycab 函数应该返回 ABC,BCA,驾驶室

This is an interview question (phone screen): write a function (in Java) to find all permutations of a given word that appear in a given text. For example, for word abc and text abcxyaxbcayxycab the function should return abc, bca, cab.

我会回答这个问题如下:

I would answer this question as follows:

  • 很显然,我也可以遍历给定单词的所有排列,并使用标准的功能。然而,它可能很难(我现在)写code生成所有字排列。

  • Obviously I can loop over all permutations of the given word and use a standard substring function. However it might be difficult (for me right now) to write code to generate all word permutations.

这是比较容易遍历所有文本串字的大小,排序每个子,并将其与排序定单词进行比较。我可以在$ C C这样的立即功能$。

It is easier to loop over all text substrings of the word size, sort each substring and compare it with the "sorted" given word. I can code such a function immediately.

我也许可以修改一些字符串搜索算法,但我不记得这些算法吧。

I can probably modify some substring search algorithm but I do not remember these algorithms now.

你会如何回答这个问题?

How would you answer this question?

推荐答案

这可能不是最有效的解决算法,但它是从一个一流的设计点干净。该解决方案采用比较排序给出的单词的方法。

This is probably not the most efficient solution algorithmically, but it is clean from a class design point of view. This solution takes the approach of comparing "sorted" given words.

我们可以说,一个字是另一个的置换,如果它包含在相同数量的相同的字母。这意味着,你可以将这个词从字符串地图<性格,整数GT; 。这种转换会产生复杂度为O(n),其中n是字符串,假设在插入你的地图的长度实施成本O(1)。

We can say that a word is a permutation of another if it contains the same letters in the same number. This means that you can convert the word from a String to a Map<Character,Integer>. Such conversion will have complexity O(n) where n is the length of the String, assuming that insertions in your Map implementation cost O(1).

地图将包含作为键都在Word和认定的字符值字符的频率。

The Map will contain as keys all the characters found in the word and as values the frequencies of the characters.

示例 ABBC转换为 [A-&GT; 1,B&GT; 2,C&GT; 1]

BACB转换为 [A-&GT; 1,B&GT; 2,C&GT; 1]

所以,如果你要知道,如果两个词是一个其他的排列,你可以将它们转换成两种地图,然后调用 Map.equals

So if you have to know if two words are one the permutation of the other, you can convert them both into maps and then invoke Map.equals.

然后,你必须遍历文本字符串和转换适用于您所寻找的字相同长度的所有子串。

Then you have to iterate over the text string and apply the transformation to all the substrings of the same length of the words that you are looking for.

提出Inerdial改进

此方法可以通过更新地图的滚动的方式加以改进。

This approach can be improved by updating the Map in a "rolling" fashion.

即。如果你匹配在这个例子草堆指数 I = 3 的OP(子串 XYA ),该地图将 [A-&GT; 1,X-大于1,Y&GT; 1] 。当在草堆推进,递减字符计数草垛[I] ,并增加了计数草垛[1 + needle.length()]

I.e. if you're matching at index i=3 in the example haystack in the OP (the substring xya), the map will be [a->1, x->1, y->1]. When advancing in the haystack, decrement the character count for haystack[i], and increment the count for haystack[i+needle.length()].

(删除零,以确保 Map.equals()的作品,或者只是实现自定义比较。)

(Dropping zeroes to make sure Map.equals() works, or just implementing a custom comparison.)

提出的最大改进

如果我们还引入了 matchedCharactersCnt 变量?在草垛的开始将是 0 。每次你改变你的地图朝着所期望的价值 - 你将变量。每次你改变它离所需要的价值 - 你递减变量。每次迭代您检查变量等于针的长度。如果它是 - 你已经找到了一个匹配。这将是不是每次比较完整地图要快。

What if we also introduce matchedCharactersCnt variable? At the beginning of the haystack it will be 0. Every time you change your map towards the desired value - you increment the variable. Every time you change it away from the desired value - you decrement the variable. Each iteration you check if the variable is equal to the length of needle. If it is - you've found a match. It would be faster than comparing the full map every time.

拟由Max提供code:

Pseudocode provided by Max:

needle = "abbc"
text = "abbcbbabbcaabbca"

needleSize = needle.length()
//Map of needle character counts
targetMap = [a->1, b->2, c->1]

matchedLength = 0
curMap = [a->0, b->0, c->0]
//Initial map initialization
for (int i=0;i<needle.length();i++) {
    if (curMap.contains(haystack[i])) {
        matchedLength++
        curMap[haystack[i]]++
    }
}

if (matchedLength == needleSize) {
    System.out.println("Match found at: 0");
}

//Search itself
for (int i=0;i<haystack.length()-needle.length();i++) {
    int targetValue1 = targetMap[haystack[i]]; //Reading from hashmap, O(1)
    int curValue1 = curMap[haystack[i]]; //Another read
    //If we are removing beneficial character
    if (targetValue1 > 0 && curValue1 > 0 && curValue1 <= targetValue1) {       
        matchedLength--;
    }
    curMap[haystack[i]] = curValue1 + 1; //Write to hashmap, O(1)


    int targetValue2 = targetMap[haystack[i+needle.length()]] //Read
    int curValue2 = curMap[haystack[i+needle.length()]] //Read
    //We are adding a beneficial character
    if (targetValue2 > 0 && curValue2 < targetValue2) { //If we don't need this letter at all, the amount of matched letters decreases
        matchedLength++;
    }
    curMap[haystack[i+needle.length()]] = curValue2 + 1; //Write

    if (matchedLength == needleSize) {
        System.out.println("Match found at: "+(i+1));
    }
}

//Basically with 4 reads and 2 writes which are 
//independent of the size of the needle,
//we get to the maximal possible performance: O(n)

这篇关于如何找到一个给定单词的所有排列在一个给定的文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆