字符串2的字谜是字符串1的子字符串 [英] Anagram of String 2 is Substring of String 1

查看:118
本文介绍了字符串2的字谜是字符串1的子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何查找字符串1的任何字谜是字符串2的子字符串?

How to find that any anagram of String 1 is sub string of String 2?

例如:-

String 1 = rove

String 1 =rove

String 2 = stackoverflow

String 2=stackoverflow

因此它将返回true,因为走动的字谜是结束,它是字符串2的子字符串

So it will return true as anagram of "rove" is "over" which is sub-string of String 2

推荐答案

<编辑时:在最坏的情况下,我的第一个答案是二次方。我将其调整为严格线性的:

On edit: my first answer was quadratic in the worst case. I've tweaked it to be strictly linear:

这里是一种基于滑动窗口概念的方法:创建一个字典,该字典以第一个字典的字母为键,并带有字母对应的频率计数。可以将其视为目标字典,第二个字符串中必须连续匹配 m 个字母,其中 m 是第一个字符串的长度。

Here is an approach based on the notion of a sliding window: Create a dictionary keyed by the letters of the first dictionary with frequency counts of the letters for the corresponding values. Think of this as a dictionary of targets which need to be matched by m consecutive letters in the second string, where m is the length of the first string.

首先处理第二个字符串中的第一个 m 个字母。对于每个这样的字母,如果它在目标字典中作为键出现,则减少对应的值减1。目标是将所有目标值驱动为0。定义差异是处理 m 个字母的第一个窗口后的值的绝对值之和。

Start by processing the first m letters in the second string. For each such letter if it appears as a key in the target dictionary decrease the corresponding value by 1. The goal is to drive all target values to 0. Define discrepancy to be the sum of the absolute values of the values after processing the first window of m letters.

重复执行以下操作:检查差异== 0 ,然后返回 True 。否则-将字符 m 放在字母前面,并检查它是否为目标键,如果是,则将值增加1。在这种情况下,此值会增加或减少差异乘以1,相应地进行调整。然后获取第二个字符串的下一个字符并进行处理。检查它是否是字典中的键,如果合适,请适当调整值和差异。

Repeatedly do the following: check if discrepancy == 0 and return Trueif it does. Otherwise -- take the character m letters ago and check if it is a target key and if so -- increase the value by 1. In this case, this either increases or decreases the discrepancy by 1, adjust accordingly. Then get the next character of the second string and process it as well. Check if it is a key in the dictionary and if so adjust the value and the discrepancy as appropriate.

由于没有嵌套循环,并且每次通过主循环都只涉及几个字典查找,比较,加法和减法,因此整个算法是线性的。

Since there are no nested loop and each pass through the main loop involves just a few dictionary lookups, comparisons, addition and subtractions, the overall algorithm is linear.

Python 3实现(显示窗口如何滑动以及调整目标计数和差异的基本逻辑):

A Python 3 implementation (which shows the basic logic of how the window slides and the target counts and discrepancy are adjusted):

def subAnagram(s1,s2):
    m = len(s1)
    n = len(s2)
    if m > n: return false
    target = dict.fromkeys(s1,0)
    for c in s1: target[c] += 1

    #process initial window
    for i in range(m):
        c = s2[i]
        if c in target:
            target[c] -= 1
    discrepancy = sum(abs(target[c]) for c in target)

    #repeatedly check then slide:
    for i in range(m,n):
        if discrepancy == 0:
            return True
        else:
            #first process letter from m steps ago from s2
            c = s2[i-m]
            if c in target:
                target[c] += 1
                if target[c] > 0: #just made things worse
                    discrepancy +=1
                else:
                    discrepancy -=1
            #now process new letter:
            c = s2[i]
            if c in target:
                target[c] -= 1
                if target[c] < 0: #just made things worse
                    discrepancy += 1
                else:
                    discrepancy -=1
    #if you get to this stage:
    return discrepancy == 0

典型输出:

>>> subAnagram("rove", "stack overflow")
True
>>> subAnagram("rowe", "stack overflow")
False

要对其进行压力测试,我从Gutenberg项目下载了 Moby Dick 的全文。它有超过一百万个字符。书中提到福尔摩沙,因此摩尔人的字谜出现在白鲸的子串中。但是,毫不奇怪,在Moby Dick中没有出现 stackoverflow的字谜:

To stress-test it, I downloaded the complete text of Moby Dick from Project Gutenberg. This has over 1 million characters. "Formosa" is mentioned in the book, hence an anagram of "moors" appears as a substring of Moby Dick. But, not surprisingly, no anagram of "stackoverflow" appears in Moby Dick:

>>> f = open("moby dick.txt")
>>> md = f.read()
>>> f.close()
>>> len(md)
1235186
>>> subAnagram("moors",md)
True
>>> subAnagram("stackoverflow",md)
False

最后一次调用大约需要1秒钟处理Moby Dick的全文,并验证其中没有出现 stackoverflow字谜。

The last call takes roughly 1 second to process the complete text of Moby Dick and verify that no anagram of "stackoverflow" appears in it.

这篇关于字符串2的字谜是字符串1的子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆