算法来确定i..j数组包含另一个数组B的所有元素的索引 [英] Algorithm to determine indices i..j of array A containing all the elements of another array B

查看:152
本文介绍了算法来确定i..j数组包含另一个数组B的所有元素的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我碰到这个问题就在接受记者采访时的问题线索。这里有一个问题:

I came across this question on an interview questions thread. Here is the question:

给定两个整型数组A [1..1]和   B〔1..M],找到最小窗口   在一个包含所有元素   B.换句话说,找到一对&所述; I,J>   使得A [i..j]含有B [1..M。

Given two integer arrays A [1..n] and B[1..m], find the smallest window in A that contains all elements of B. In other words, find a pair < i , j > such that A[i..j] contains B[1..m].

如果A不包含的所有元素   B,那么I,J,可以返回-1。   在A中的整数不必以相同的顺序与它们在B.如果有一个以上的最小的窗口(不同,但具有相同的尺寸),那么其足以返回它们中的一个。

If A doesn't contain all the elements of B, then i,j can be returned as -1. The integers in A need not be in the same order as they are in B. If there are more than one smallest window (different, but have the same size), then its enough to return one of them.

例:A [1,2,5,11,2,6,8,24,101,17,8]和B [5,2,11,8,17。该算法应该返回和j = 9(17的A股指数)I = 2(在5索引)。

Example: A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]. The algorithm should return i = 2 (index of 5 in A) and j = 9 (index of 17 in A).

现在我能想到的两个版本。

Now I can think of two variations.

让我们假设B有重复。

  1. 此变化不考虑每个元素出现在B.它只检查所有发生在B中的独特的元素和找到最少相应窗口中甲满足上述问题的次数。举例来说,如果A [1,2,4,5,7]和B [2,2,5],这种变化不会理会有两岁2在B和只检查一种B中的唯一的整数,即2和5,因此返回I = 1,J = 3。

  1. This variation doesn't consider the number of times each element occurs in B. It just checks for all the unique elements that occur in B and finds the smallest corresponding window in A that satisfies the above problem. For example, if A[1,2,4,5,7] and B[2,2,5], this variation doesn't bother about there being two 2's in B and just checks A for the unique integers in B namely 2 and 5 and hence returns i=1, j=3.

这变化占了重复的B.如果有两个2在B,那么它希望看到至少2个是在一个为好。如果没有,则返回-1,-1。

This variation accounts for duplicates in B. If there are two 2's in B, then it expects to see at least two 2's in A as well. If not, it returns -1,-1.

当你回答,请让我知道其中的变化你回答。伪code应该做的。请注明时间和空间的复杂性,如果它是棘手的计算。何况,如果你的解决方案假设数组索引开始为1或0了。

When you answer, please do let me know which variation you are answering. Pseudocode should do. Please mention space and time complexity if it is tricky to calculate it. Mention if your solution assumes array indices to start at 1 or 0 too.

在此先感谢。

推荐答案

以下是可证明最佳至多一个对数因子。 (我相信日志因素不可摆脱的,所以它是最佳的。)

Complexity

Time: O((m+n)log m)

Space: O(m)

The following is provably optimal up to a logarithmic factor. (I believe the log factor cannot be got rid of, and so it's optimal.)

1变种变种是2只是一个特例所有的多重性为1,从B.删除重复因此,这足以应付后者的变体后,如果你想变体1,只需删除 0(M M)的重复的时间。下面,就让 M 表示在B中不同元素的数量,我们假设 M&LT; ñ,因为否则的话我们就可以恢复 1 ,在一定的时间。

Variant 1 is just a special case of variant 2 with all the multiplicities being 1, after removing duplicates from B. So it's enough to handle the latter variant; if you want variant 1, just remove duplicates in O(m log m) time. In the following, let m denote the number of distinct elements in B. We assume m < n, because otherwise we can just return -1, in constant time.

对于每个指数中,我们会找到最小的指数 S [I] ,使得 A [i..s [I]] 包含 B [1..M] ,用正确的多重性。关键的发现是, S [I] 的非减的,而这正是使我们能够做到这一点的摊销线性时间。

For each index i in A, we will find the smallest index s[i] such that A[i..s[i]] contains B[1..m], with the right multiplicities. The crucial observation is that s[i] is non-decreasing, and this is what allows us to do it in amortised linear time.

START I = J = 1 。我们将保持一个元组(C [1],C [2],... C [M]) B的每一个元素出现在的次数当前窗口 A [i..j] 。我们也将保持指数的一组取值(子集 1..M )该计数右(即 K 为其 C [K] = 1 的变体1或 C [K] =&lt;保证正确的数量和GT; 的变型2)

Start with i=j=1. We will keep a tuple (c[1], c[2], ... c[m]) of the number of times each element of B occurs, in the current window A[i..j]. We will also keep a set S of indices (a subset of 1..m) for which the count is "right" (i.e., k for which c[k]=1 in variant 1, or c[k] = <the right number> in variant 2).

因此​​,对于 I = 1 ,从 J = 1 ,增加每个 C [A [J]] (如 A [J] 是B的元素),检查是否 C [ A [J]] 现在是正确的,以及添加或删除Ĵ取值相应。当停止取值有大小 M 。现在,您已经发现了 S [1] ,在大多数 O(N日志米)的时间。 (有 O(N) Ĵ的,并且每一组操作了 O(登录M)的时间。)

So, for i=1, starting with j=1, increment each c[A[j]] (if A[j] was an element of B), check if c[A[j]] is now "right", and add or remove j from S accordingly. Stop when S has size m. You've now found s[1], in at most O(n log m) time. (There are O(n) j's, and each set operation took O(log m) time.)

现在用于计算连续 S [I] S,做到以下几点。增量,减量 C [A [1]] 更新取值因此,如有必要,增量Ĵ直到取值有大小 M 了。这就给了你 S [I] 每个。最后,报告(I,S [I])为其 S [I] -i 是最小

Now for computing successive s[i]s, do the following. Increment i, decrement c[A[i]], update S accordingly, and, if necessary, increment j until S has size m again. That gives you s[i] for each i. At the end, report the (i,s[i]) for which s[i]-i was smallest.

请注意,虽然它看起来,你可能要执行至 O(N)步骤(递增Ĵ)每个,第二个指针Ĵ只向右移动:所以总次数可以增加Ĵ至多 N 。 (这是摊销分析。)每次递增Ĵ,你可能会执行一系列的操作,需要 O(M)的的时间,所以总的时间是 O(N日志米)。所需的空间是用于保持计数的元组,该组B时,组S的元素,和其他变量的一些恒定数目,所以 O(米)在所有

Note that although it seems that you might be performing up to O(n) steps (incrementing j) for each i, the second pointer j only moves to the right: so the total number of times you can increment j is at most n. (This is amortised analysis.) Each time you increment j, you might perform a set operation that takes O(log m) time, so the total time is O(n log m). The space required was for keeping the tuple of counts, the set of elements of B, the set S, and some constant number of other variables, so O(m) in all.

有一个明显的 O(M + N)下限的,因为你需要检查所有元素。所以,唯一的问题是,我们是否可以证明日志因素是必要的;我相信这是。

There is an obvious O(m+n) lower bound, because you need to examine all the elements. So the only question is whether we can prove the log factor is necessary; I believe it is.

这篇关于算法来确定i..j数组包含另一个数组B的所有元素的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆