找到的子串,与一些附加条件 [英] Finding a substring, with some additional conditions

查看:136
本文介绍了找到的子串,与一些附加条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我给出一个字符串,它看起来像这样:

  1011010100
 

和我的任务是找到一子而空值的数量始终是与LT长度=的那些数字。这总是应该从右到左,从左到右发生,而扫描的子字符串。所以在这个例子中,答案是:

  10110101 => 8
 

我知道的复杂性应为O(n)的或为O(n log n)的,因为长度可达10 ^ 6

任何想法?

解决方案

O(N)解决方案很简单实际,通过建立高度阵,重presenting 1的相对于0的数量的数量。所以2的高度意味着有2个1比0。在我们遍历数组的高度,一旦执行一些极大性有一定的条件检查。

关键观察

请注意,一个子阵列满足的条件必须有其高度最低处开始,并最大在端(作为相对于子阵列,而不是整个阵列)。

高度阵列中的问题样品可绘制这样的,有明显的回答:

       v
   / \ / \ / \
/ \ / \
^

证明:

假设高度不最小值在开始,这意味着有子阵列所在的高度比开始低内的一个点。在这点上,为0的数量应该比1矛盾数多。

假设身高不是最大的结尾,这意味着在子阵列那里的高度比最终大一点,比如说在指数Ĵ。然后,在指数Ĵ到底有更多的0比1(因为高度下降),所以当我们扫描,从右边的子数组向左,我们会发现更多0比1的指数Ĵ。矛盾。

算法

现在的问题可以PTED作为寻找哪个具有最高高度在子阵列端部的最长子阵列,同时保持最小,以不超过该高度在开始时间$ P $。这非常类似于最大子阵列问题喜欢由klrmlr(阵列的连续子序列中提到的还好说是子阵)。而这个想法是不是保持一个 O(N)状态,而是保持最大到目前为止和最大值在这一点上。

此之后的算法,下面是伪code,遍历数组一次:

程序Balance_Left_Right

  1. 记录的最低和最高点至今
  2. 如果在高度在这一点上是比最低点时为止,此点之后改变起始点到索引
  3. 如果在高度在这一点上是高于或等于最高点为止,那么这是一个有效的子阵列,记录长度(以及开始和结束的索引,如果你喜欢)

不过我们很快就会看到(通过个人通信指由亚当·杰克逊)为这个测试案例的一个问题: 1100101 ,显示如下:

 / \
/ \ / \ /

正确答案是3(最后101),但在上述的算法将得到2(第11)。这是因为,我们的答案是显然的背后隐藏着一个山高(即,在答案的最低点比山不低,而在答案的最高点并不比山高)。

因此​​,我们需要确保当我们运行程序Balance_Left_Right(上图),没有山高隐藏的答案。因此,解决方案是从右侧遍历数组一次,尽量分割阵列为多个部分,其中在每个部分中的这个属性保存:1的数目总是> = 0的数目,从右侧横穿,并且还为每个部分中,它不能被延伸到左再

然后,在每个部分中,从左边穿过时,将具有最大高度的部分的结尾,这是最大的。它可以证明与此属性,方法balance_left_right会发现本节中的正确答案。所以,我们只是呼吁每一节我们balance_left_right方法,然后拿在那些最大的答案。

现在,你可能会问,为什么它是足够的,在每个章节运行Balance_Left_Right?这是因为,答案需要属性从左边和从右侧到保持,因此它必须内部的部分中的一个在于,由于每个人的财产的部分满足半

该算法仍然 O(N)因为我们只从右边从左边访问每个元素两次,一次和一次。

最后一个测试案例将被划分如下:

 / | \ |
/ | \ | / \ /
** ***

在这里只标有星号(*)的部分会被执行。

因此​​,新的算法如下:

程序Max_Balance_Left_Right

  1. 所在分区的1> =数0从右边数输入(右使用Balance_Left,或者可以称之为Balance_right)
  2. 运行Balance_Left_Right每个分区上
  3. 取最大值

这里的code在Python:

高清balance_left_right(ARR):     低= 0     上= -2 ** 32     lower_idx = 0#可选     upper_idx = -1#可选     结果=(0,0,0)     身高= 0     长度= 0     为IDX,NUM在历数(ARR):         长+ = 1         身高+ = 1如果num == 1,否则-1         如果高度<下:             低=身高#复位最低             上=身高#复位最高             lower_idx = IDX + 1#可选,记录的起点             长度= 0#复位答案         如果高度> =上:             上=身高             upper_idx = IDX#可选,记录终点             如果长度GT;导致[0]:#以最大长度                 结果=(长,lower_idx,upper_idx)     返回结果 高清max_balance_left_right(ARR):     all_partitions = []     启动= 0     结束= LEN(ARR)     right_partitions = balance_left(逆转(ARR [开始:结束]))     对于right_start,right_end在right_partitions:         all_partitions.append((最终right_end,最终right_start))     结果=(0,0,0)     对于开始,结束all_partitions:         候选= balance_left_right(ARR [开始:结束])         如果结果[0] LT;候选[0]:             结果=(候选[0],候选[1] +开始,候选[2] +开始)     返回结果 高清balance_left(ARR):     低= 0     start_idx = 0     end_idx = -1     身高= 0     结果= []     为IDX,NUM在历数(ARR):         身高+ = 1如果num == 1,否则-1         如果高度<降低:             如果end_idx!= -1:                 result.append((start_idx,end_idx))             低=身高             start_idx = IDX + 1             end_idx = -1         其他:             end_idx = IDX + 1     如果end_idx!= -1:         result.append((start_idx,end_idx))     返回结果 test_cases = [         [1,0,1,1,0,1,0,1,0,0]         [0,0,1,0,1,0,0,1,0,1,0,0,1,1,0,1,0,1,0,0,1]         [1,1,1,0,0,0,1,0,0,1,1,0,1,1,0,1,1,0]         [1,1,0,0,1,0,1,0,1,1,0,0,1,0,0]         [1,1,0,0,1,0,1]         [1,1,1,1,1,0,0,0,1,0,1,0,1,1,0,0,1,0,0,1,0,1,1]         ] 对于test_case在test_cases:     打印余额左权:     打印test_case     打印balance_left_right(test_case)     打印'最大的平衡左,右:     打印test_case     打印max_balance_left_right(test_case)     打印

这将打印:

平衡左,右:
[1,0,1,1,0,1,0,1,0,0]
(8,0,7)
最大的平衡左,右:
[1,0,1,1,0,1,0,1,0,0]
(8,0,7)

平衡左,右:
[0,0,1,0,1,0,0,1,0,1,0,0,1,1,0,1,0,1,0,0,1]
(6,12,17)
最大的平衡左,右:
[0,0,1,0,1,0,0,1,0,1,0,0,1,1,0,1,0,1,0,0,1]
(6,12,17)

平衡左,右:
[1,1,1,0,0,0,1,0,0,1,1,0,1,1,0,1,1,0]
(8,9,16)
最大的平衡左,右:
[1,1,1,0,0,0,1,0,0,1,1,0,1,1,0,1,1,0]
(8,9,16)

平衡左,右:
[1,1,0,0,1,0,1,0,1,1,0,0,1,0,0]
(10,0,9)
最大的平衡左,右:
[1,1,0,0,1,0,1,0,1,1,0,0,1,0,0]
(10,0,9)

平衡左,右:
[1,1,0,0,1,0,1]
(2,0,1)
最大的平衡左,右:
[1,1,0,0,1,0,1]
(3,4,6)

平衡左,右:
[1,1,1,1,1,0,0,0,1,0,1,0,1,1,0,0,1,0,0,1,0,1,1]
(5,0,4)
最大的平衡左,右:
[1,1,1,1,1,0,0,0,1,0,1,0,1,1,0,0,1,0,0,1,0,1,1]
(6,8,13)

有关你的眼睛享受,高度阵列的测试案例:

第一:
       v
   / \ / \ / \
/ \ / \
^

第二:
\
 \ / \ /符\ v
      \ / \ / \ / \ / \ / \
           \ / \ /
            ^

第三:
                v
  / \ / \
 / \ / \ /
/ \ / \ / \ /
        \ /
         ^

第四:
         v
 / \ / \
/ \ / \ / \ / \ / \
^ \

第五:
 /符\ v
/ \ / \ /
    ^

第六:
    /符\ v
   / \ / \
  / \ / \ / \ / \ / \ /
 / ^ \ / \ /
/


澄清就问题

由于有些读者都搞不清楚究竟是什么OP希望,尽管它已经说得很清楚了问题,让我通过一些例子说明这个问题。

首先,从讨论的任务:

  

和我的任务是找到一子而空值的数量始终是与LT长度=的那些数字。这总是应该从右到左,而发生扫描子从左至右

这是指类似加泰罗尼亚数选票问题或的可用变化问题。在维基,你可以勾选单调路径的问题,在这里你可以映射向右移动为1和动起来为0。

的问题是要找到原始数组的一个子,这样,当子阵列从遍历左到右,右到左,此属性包含:

  

0的数所见过到目前为止不应超过1数所见过这么远。<​​/ P>

例如,字符串 1010 持有物业的左到右,因为如果我们扫描左于─阵列没错,总会有更多的1比0。但属性不从持有从右到左,自右遇到的第一个字符是0,所以在一开始,我们有更多的0(有)比1的(有没有)。

例如由OP给出,我们可以看到,该字符串的答案 1011010100 是第一个八个字,即: 10110101 。为什么呢?

好了,所以当我们遍历从左至右子阵,我们看到,总有更多的1比0。让我们来检查的1和0的数字,因为我们遍历数组由左到右:

1:NUM(0)= 0,NUM(1)= 1
0:NUM(0)= 1,NUM(1)= 1
1:NUM(0)= 1,NUM(1)= 2
1:NUM(0)= 1,NUM(1)= 3
0:NUM(0)= 2,NUM(1)= 3
1:NUM(0)= 2,NUM(1)= 4
0:NUM(0)= 3,NUM(1)= 4
1:NUM(0)= 3,NUM(1)= 5

我们可以看到,在任何时间点的0的数目总是小于或等于1的数。这就是为什么财产左到右成立。和同一校验可以从右到左进行。

那么,为什么不 1011010100 并回答?

让我们看到,当我们遍历字符串从右至左:

0:NUM(0)= 1,NUM(1)= 0
0:NUM(0)= 2,NUM(1)= 0
1:NUM(0)= 2,NUM(1)= 1
...

我没有把完整的遍历,因为房地产已经从第一步侵犯,因为我们有 NUM(0)&GT; NUM(1)。这就是为什么字符串 1011010100 不符合问题的约束。

您还可以看到我的高度阵实际上是1的数和0,号码分别的区别: NUM(1) - NUM(0)。因此,为了拥有的财产,我们必须有[相关]高度积极的。也就是说,可通过具有高度不小于初始高度被可视化

I'm given a string which looks like this:

1011010100

And my task is to find the length of a substring which number of nulls is always <= number of ones. And this should always happen while 'scanning' substring from right to left and from left to right. So in this example, the answer would be:

10110101 => 8

I know that the complexity should be either O(n) or O(n log n), because length can reach up to 10^6

Any ideas?

解决方案

The O(n) solution is quite simple actually, by building the "height array", representing the number of 1's relative to number of 0's. So a height of 2 means there are 2 more 1's than 0's. The we iterate over the height array once performing some maximality checking with some conditions.

Crucial Observation

Note that a subarray fulfilling the conditions must have its height minimum at the beginning, and maximum at the end (as relative to the subarray, not the whole array).

The height array for the sample in the question can be plotted like this, with marked answer:

       v
   /\/\/\
/\/      \
^

Proof:

Suppose the height is not minimum at the beginning, that means there is a point inside the subarray where the height is lower than the beginning. At this point, the number of 0 should be larger than the number of 1. Contradiction.

Suppose the height is not maximum at the end, that means there is a point in the subarray where the height is larger than the end, say at index j. Then at index j to the end there are more 0 than 1 (since the height decreases), and so when we "scan" the subarray from right to left we will find more 0 than 1 at index j. Contradiction.

Algorithm

Now the problem can be interpreted as finding the longest subarray which ends with the highest height in the subarray, while keeping the minimum to not exceed the height at the beginning. This is very similar to maximum subarray problem like mentioned by klrmlr ("contiguous subsequence of an array" is better said as "subarray"). And the idea is not keeping an O(n) state, but rather keeping the "maximum so far" and "maximum at this point".

Following that algorithm, below is the pseudocode, traversing the array once:

Procedure Balance_Left_Right

  1. Record the lowest and highest point so far
  2. If the height at this point is lower than the lowest point so far, then change the starting point to the index after this point
  3. If the height at this point is higher or equal to the highest point so far, then this is a valid subarray, record the length (and start and end indices, if you like)

However we will soon see a problem (as pointed by Adam Jackson through personal communication) for this test case: 1100101, visualized as follows:

 /\
/  \/\/

The correct answer is 3 (the last 101), but the above algorithm will get 2 (the first 11). This is because our answer is apparently hidden behind a "high mountain" (i.e., the lowest point in the answer is not lower than the mountain, and the highest point in the answer is not higher than the mountain).

And so we need to ensure that when we run the Procedure Balance_Left_Right (above), there is no "high mountain" hiding the answer. And so the solution is to traverse the array once from the right, try to partition the array into multiple sections where in each section, this property holds: "the number of 1's is always >= the number of 0's, as traversed from the right", and also for each section, it can't be extended to the left anymore.

Then, in each section, when traversed from the left, will have the maximum height at the end of the section, and this is the maximum. And it can be proven that with this property, the method balance_left_right will find the correct answer for this section. So, we just call our balance_left_right method on each section, and then take the maximum answer among those.

Now, you may ask, why it's sufficient to run Balance_Left_Right on each section? This is because the answer requires the property to hold from the left and from the right, and so it must lies inside one of the sections, since each of the section satisfies half of the property.

The algorithm is still O(n) because we only visit each element twice, once from the right, and once from the left.

The last test case will be partitioned as follows:

 /|\ |
/ | \|/\/
**    ***

where only the sections marked with asterisk (*) are taken.

So the new algorithm is as follows:

Procedure Max_Balance_Left_Right

  1. Partition the input where the number of 1 >= number of 0 from the right (Using Balance_Left from the right, or can call it Balance_right)
  2. Run Balance_Left_Right on each partition
  3. Take the maximum

Here's the code in Python:

def balance_left_right(arr):
    lower = 0
    upper = -2**32
    lower_idx = 0 # Optional
    upper_idx = -1 # Optional
    result = (0,0,0)
    height = 0
    length = 0
    for idx, num in enumerate(arr):
        length += 1
        height += 1 if num==1 else -1
        if height<lower:
            lower = height # Reset the lowest
            upper = height # Reset the highest
            lower_idx = idx+1 # Optional, record the starting point
            length = 0 # Reset the answer
        if height>=upper:
            upper = height
            upper_idx = idx # Optional, record the end point
            if length > result[0]: # Take maximum length
                result = (length, lower_idx, upper_idx)
    return result

def max_balance_left_right(arr):
    all_partitions = []
    start = 0
    end = len(arr)
    right_partitions = balance_left(reversed(arr[start:end]))
    for right_start, right_end in right_partitions:
        all_partitions.append((end-right_end, end-right_start))
    result = (0,0,0)
    for start, end in all_partitions:
        candidate = balance_left_right(arr[start:end])
        if result[0] < candidate[0]:
            result = (candidate[0], candidate[1]+start, candidate[2]+start)
    return result

def balance_left(arr):
    lower = 0
    start_idx = 0
    end_idx = -1
    height = 0
    result = []
    for idx, num in enumerate(arr):
        height += 1 if num==1 else -1
        if height < lower:
            if end_idx != -1:
                result.append((start_idx,end_idx))
            lower = height
            start_idx = idx+1
            end_idx = -1
        else:
            end_idx = idx+1
    if end_idx != -1:
        result.append((start_idx, end_idx))
    return result

test_cases = [
        [1,0,1,1,0,1,0,1,0,0],
        [0,0,1,0,1,0,0,1,0,1,0,0,1,1,0,1,0,1,0,0,1],
        [1,1,1,0,0,0,1,0,0,1,1,0,1,1,0,1,1,0],
        [1,1,0,0,1,0,1,0,1,1,0,0,1,0,0],
        [1,1,0,0,1,0,1],
        [1,1,1,1,1,0,0,0,1,0,1,0,1,1,0,0,1,0,0,1,0,1,1]
        ]
for test_case in test_cases:
    print 'Balance left right:'
    print test_case
    print balance_left_right(test_case)
    print 'Max balance left right:'
    print test_case
    print max_balance_left_right(test_case)
    print

which will print:

Balance left right:
[1, 0, 1, 1, 0, 1, 0, 1, 0, 0]
(8, 0, 7)
Max balance left right:
[1, 0, 1, 1, 0, 1, 0, 1, 0, 0]
(8, 0, 7)

Balance left right:
[0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1]
(6, 12, 17)
Max balance left right:
[0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1]
(6, 12, 17)

Balance left right:
[1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0]
(8, 9, 16)
Max balance left right:
[1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0]
(8, 9, 16)

Balance left right:
[1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0]
(10, 0, 9)
Max balance left right:
[1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0]
(10, 0, 9)

Balance left right:
[1, 1, 0, 0, 1, 0, 1]
(2, 0, 1)
Max balance left right:
[1, 1, 0, 0, 1, 0, 1]
(3, 4, 6)

Balance left right:
[1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1]
(5, 0, 4)
Max balance left right:
[1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1]
(6, 8, 13)

For your eyes enjoyment, the height array for the test cases:

First:
       v
   /\/\/\
/\/      \
^

Second:
\
 \/\/\           v
      \/\/\  /\/\/\
           \/      \/
            ^

Third:
                v
  /\            /\
 /  \        /\/
/    \/\  /\/
        \/
         ^

Fourth:
         v
 /\      /\
/  \/\/\/  \/\
^             \

Fifth:
 /\   v
/  \/\/
    ^

Sixth:
    /\       v
   /  \      /\
  /    \/\/\/  \/\    /
 /      ^         \/\/
/


Clarification Regarding the Question

As some of the readers are confused on what exactly OP wants, although it's already stated clearly in the question, let me explain the question by some examples.

First, the task from the question:

And my task is to find the length of a substring which number of nulls is always <= number of ones. And this should always happen while 'scanning' substring from right to left and from left to right

This refers to something like "Catalan Number Ballot Problem" or "Available Change Problem". In the Wiki you can check the "monotonic path" problem, where you can map "move right" as "1" and "move up" as "0".

The problem is to find a subarray of the original array, such that, when the subarray is traversed from left-to-right and right-to-left, this property holds:

The number of 0's seen so far should not exceed the number of 1's seen so far.

For example, the string 1010 holds the property from left-to-right, because if we scan the array from left-to-right, there will always be more 1's than 0's. But the property doesn't hold from right-to-left, since the first character encountered from the right is 0, and so at the beginning we have more 0's (there is one) than 1's (there is none).

For example given by OP, we see that the answer for the string 1011010100 is the first eight characters, namely: 10110101. Why?

Ok, so when we traverse the subarray from left to right, we see that there is always more 1's than 0's. Let's check the number of 1's and 0's as we traverse the array from left-to-right:

1: num(0) = 0, num(1) = 1
0: num(0) = 1, num(1) = 1
1: num(0) = 1, num(1) = 2
1: num(0) = 1, num(1) = 3
0: num(0) = 2, num(1) = 3
1: num(0) = 2, num(1) = 4
0: num(0) = 3, num(1) = 4
1: num(0) = 3, num(1) = 5

We can see that at any point of time the number of 0's is always less than or equal to the number of 1's. That's why the property holds from left-to-right. And the same check can be done from right-to-left.

So why isn't 1011010100 and answer?

Let's see when we traverse the string right-to-left:

0: num(0) = 1, num(1) = 0
0: num(0) = 2, num(1) = 0
1: num(0) = 2, num(1) = 1
...

I didn't put the full traversal because the property has already been violated since the first step, since we have num(0) > num(1). That's why the string 1011010100 doesn't satisfy the constraints of the problem.

You can see also that my "height array" is actually the difference between the number of 1's and the number of 0's, namely: num(1) - num(0). So in order to have the property, we must have the [relative] height positive. That, can be visualized by having the height not less than the initial height.

这篇关于找到的子串,与一些附加条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆