最长递增子序列 [英] Longest increasing subsequence

查看:28
本文介绍了最长递增子序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个输入序列,找到最长(不一定是连续的)递增子序列的最佳方法是什么

Given an input sequence, what is the best way to find the longest (not necessarily continuous) increasing subsequence

[0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]  # input

[1, 9, 13, 15]  # an example of an increasing subsequence (not the longest)

[0, 2, 6, 9, 13, 15]  # longest increasing subsequence (not a unique answer)
[0, 2, 6, 9, 11, 15]  # another possible solution

我正在寻找最好的算法.如果有代码,Python 就好了,但什么都好.

I'm looking for the best algorithm. If there is code, Python would be nice, but anything is alright.

推荐答案

我偶然发现了这个问题,并想出了这个 Python 3 实现:

I just stumbled in this problem, and came up with this Python 3 implementation:

def subsequence(seq):
    if not seq:
        return seq

    M = [None] * len(seq)    # offset by 1 (j -> j-1)
    P = [None] * len(seq)

    # Since we have at least one element in our list, we can start by 
    # knowing that the there's at least an increasing subsequence of length one:
    # the first element.
    L = 1
    M[0] = 0

    # Looping over the sequence starting from the second element
    for i in range(1, len(seq)):
        # Binary search: we want the largest j <= L
        #  such that seq[M[j]] < seq[i] (default j = 0),
        #  hence we want the lower bound at the end of the search process.
        lower = 0
        upper = L

        # Since the binary search will not look at the upper bound value,
        # we'll have to check that manually
        if seq[M[upper-1]] < seq[i]:
            j = upper

        else:
            # actual binary search loop
            while upper - lower > 1:
                mid = (upper + lower) // 2
                if seq[M[mid-1]] < seq[i]:
                    lower = mid
                else:
                    upper = mid

            j = lower    # this will also set the default value to 0

        P[i] = M[j-1]

        if j == L or seq[i] < seq[M[j]]:
            M[j] = i
            L = max(L, j+1)

    # Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
    result = []
    pos = M[L-1]
    for _ in range(L):
        result.append(seq[pos])
        pos = P[pos]

    return result[::-1]    # reversing

因为我花了一些时间来理解算法是如何工作的,所以我的评论有点冗长,我也会添加一个简短的解释:

Since it took me some time to understand how the algorithm works I was a little verbose with comments, and I'll also add a quick explanation:

  • seq 是输入序列.
  • L 是一个数字:它在遍历序列时得到更新,并标记到该时刻找到的最长递增子序列的长度.
  • M 是一个列表.M[j-1] 将指向 seq 的索引,该索引包含可用于(在最后)构建长度为 j.
  • P 是一个列表.P[i] 将指向M[j],其中iseq 的索引.简而言之,它告诉哪个是子序列的前一个元素.P 用于构建最后的结果.
  • seq is the input sequence.
  • L is a number: it gets updated while looping over the sequence and it marks the length of longest incresing subsequence found up to that moment.
  • M is a list. M[j-1] will point to an index of seq that holds the smallest value that could be used (at the end) to build an increasing subsequence of length j.
  • P is a list. P[i] will point to M[j], where i is the index of seq. In a few words, it tells which is the previous element of the subsequence. P is used to build the result at the end.

算法的工作原理:

  1. 处理空序列的特殊情况.
  2. 从 1 个元素的子序列开始.
  3. 使用索引 i 循环输入序列.
  4. 通过二分查找找到 j 使得 seq[M[j] 成为 < 而不是 seq[i].
  5. 更新PML.
  6. 追溯结果并反向返回.
  1. Handle the special case of an empty sequence.
  2. Start with a subsequence of 1 element.
  3. Loop over the input sequence with index i.
  4. With a binary search find the j that let seq[M[j] be < than seq[i].
  5. Update P, M and L.
  6. Traceback the result and return it reversed.

注意:维基百科算法的唯一区别是 M 列表中 1 的偏移量,而 X 在这里称为 seq.我还使用 Eric Gustavson 答案 中显示的单元测试版本进行了稍微改进的测试,并通过了所有测试.

Note: The only differences with the wikipedia algorithm are the offset of 1 in the M list, and that X is here called seq. I also test it with a slightly improved unit test version of the one showed in Eric Gustavson answer and it passed all tests.

示例:

seq = [30, 10, 20, 50, 40, 80, 60]

       0    1   2   3   4   5   6   <-- indexes

最后我们将有:

M = [1, 2, 4, 6, None, None, None]
P = [None, None, 1, 2, 2, 4, 4]
result = [10, 20, 40, 60]

正如您将看到的,P 非常简单.我们必须从最后看,所以它告诉在60之前有40,80之前有40,在40之前有20,在50之前有20,在20之前有10,停止.

As you'll see P is pretty straightforward. We have to look at it from the end, so it tells that before 60 there's 40,before 80 there's 40, before 40 there's 20, before 50 there's 20 and before 20 there's 10, stop.

复杂的部分在M.开头的 M[0, None, None, ...] 自长度为 1 的子序列的最后一个元素(因此 M<中的位置 0/code>) 位于索引 0:30.

The complicated part is on M. At the beginning M was [0, None, None, ...] since the last element of the subsequence of length 1 (hence position 0 in M) was at the index 0: 30.

此时我们将开始循环 seq 并查看 10,因为 10<> 比 30M 会更新:

At this point we'll start looping on seq and look at 10, since 10 is < than 30, M will be updated:

if j == L or seq[i] < seq[M[j]]:
    M[j] = i

所以现在 M 看起来像:[1, None, None, ...].这是一件好事,因为 10 有更多的机会来创建更长的递增子序列.(新的1是10的索引)

So now M looks like: [1, None, None, ...]. This is a good thing, because 10 have more chanches to create a longer increasing subsequence. (The new 1 is the index of 10)

现在轮到20.使用 1020 我们有长度为 2 的子序列(M 中的索引 1),所以 M 将是:[1, 2, None, ...].(新的2是20的索引)

Now it's the turn of 20. With 10 and 20 we have subsequence of length 2 (index 1 in M), so M will be: [1, 2, None, ...]. (The new 2 is the index of 20)

现在轮到50.50 不会成为任何子序列的一部分,因此没有任何变化.

Now it's the turn of 50. 50 will not be part of any subsequence so nothing changes.

现在轮到40.对于 102040 我们有一个长度为 3 的子(M 中的索引 2,所以 M 将是: [1, 2, 4, None, ...] . (新的 4 是 40 的索引)

Now it's the turn of 40. With 10, 20 and 40 we have a sub of length 3 (index 2 in M, so M will be: [1, 2, 4, None, ...] . (The new 4 is the index of 40)

等等……

要完整浏览代码,您可以复制并粘贴此处:)

For a complete walk through the code you can copy and paste it here :)

这篇关于最长递增子序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆