最长递增子序列 [英] Longest increasing subsequence
问题描述
给定一个输入序列,找到最长(不一定是连续的)递增子序列的最佳方法是什么
Given an input sequence, what is the best way to find the longest (not necessarily continuous) increasing subsequence
[0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15] # input
[1, 9, 13, 15] # an example of an increasing subsequence (not the longest)
[0, 2, 6, 9, 13, 15] # longest increasing subsequence (not a unique answer)
[0, 2, 6, 9, 11, 15] # another possible solution
我正在寻找最好的算法.如果有代码,Python 就好了,但什么都好.
I'm looking for the best algorithm. If there is code, Python would be nice, but anything is alright.
推荐答案
我偶然发现了这个问题,并想出了这个 Python 3 实现:
I just stumbled in this problem, and came up with this Python 3 implementation:
def subsequence(seq):
if not seq:
return seq
M = [None] * len(seq) # offset by 1 (j -> j-1)
P = [None] * len(seq)
# Since we have at least one element in our list, we can start by
# knowing that the there's at least an increasing subsequence of length one:
# the first element.
L = 1
M[0] = 0
# Looping over the sequence starting from the second element
for i in range(1, len(seq)):
# Binary search: we want the largest j <= L
# such that seq[M[j]] < seq[i] (default j = 0),
# hence we want the lower bound at the end of the search process.
lower = 0
upper = L
# Since the binary search will not look at the upper bound value,
# we'll have to check that manually
if seq[M[upper-1]] < seq[i]:
j = upper
else:
# actual binary search loop
while upper - lower > 1:
mid = (upper + lower) // 2
if seq[M[mid-1]] < seq[i]:
lower = mid
else:
upper = mid
j = lower # this will also set the default value to 0
P[i] = M[j-1]
if j == L or seq[i] < seq[M[j]]:
M[j] = i
L = max(L, j+1)
# Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
result = []
pos = M[L-1]
for _ in range(L):
result.append(seq[pos])
pos = P[pos]
return result[::-1] # reversing
因为我花了一些时间来理解算法是如何工作的,所以我的评论有点冗长,我也会添加一个简短的解释:
Since it took me some time to understand how the algorithm works I was a little verbose with comments, and I'll also add a quick explanation:
seq
是输入序列.L
是一个数字:它在遍历序列时得到更新,并标记到该时刻找到的最长递增子序列的长度.M
是一个列表.M[j-1]
将指向seq
的索引,该索引包含可用于(在最后)构建长度为j
.P
是一个列表.P[i]
将指向M[j]
,其中i
是seq
的索引.简而言之,它告诉哪个是子序列的前一个元素.P
用于构建最后的结果.
seq
is the input sequence.L
is a number: it gets updated while looping over the sequence and it marks the length of longest incresing subsequence found up to that moment.M
is a list.M[j-1]
will point to an index ofseq
that holds the smallest value that could be used (at the end) to build an increasing subsequence of lengthj
.P
is a list.P[i]
will point toM[j]
, wherei
is the index ofseq
. In a few words, it tells which is the previous element of the subsequence.P
is used to build the result at the end.
算法的工作原理:
- 处理空序列的特殊情况.
- 从 1 个元素的子序列开始.
- 使用索引
i
循环输入序列. - 通过二分查找找到
j
使得seq[M[j]
成为<
而不是seq[i]
. - 更新
P
、M
和L
. - 追溯结果并反向返回.
- Handle the special case of an empty sequence.
- Start with a subsequence of 1 element.
- Loop over the input sequence with index
i
. - With a binary search find the
j
that letseq[M[j]
be<
thanseq[i]
. - Update
P
,M
andL
. - Traceback the result and return it reversed.
注意:与维基百科算法的唯一区别是 M
列表中 1 的偏移量,而 X
在这里称为 seq
.我还使用 Eric Gustavson 答案 中显示的单元测试版本进行了稍微改进的测试,并通过了所有测试.
Note: The only differences with the wikipedia algorithm are the offset of 1 in the M
list, and that X
is here called seq
. I also test it with a slightly improved unit test version of the one showed in Eric Gustavson answer and it passed all tests.
示例:
seq = [30, 10, 20, 50, 40, 80, 60]
0 1 2 3 4 5 6 <-- indexes
最后我们将有:
M = [1, 2, 4, 6, None, None, None]
P = [None, None, 1, 2, 2, 4, 4]
result = [10, 20, 40, 60]
正如您将看到的,P
非常简单.我们必须从最后看,所以它告诉在60
之前有40,
在80
之前有40
,在40
之前有20
,在50
之前有20
,在20
之前有10
,停止.
As you'll see P
is pretty straightforward. We have to look at it from the end, so it tells that before 60
there's 40,
before 80
there's 40
, before 40
there's 20
, before 50
there's 20
and before 20
there's 10
, stop.
复杂的部分在M
.开头的 M
是 [0, None, None, ...]
自长度为 1 的子序列的最后一个元素(因此 M<中的位置 0/code>) 位于索引 0:
30
.
The complicated part is on M
. At the beginning M
was [0, None, None, ...]
since the last element of the subsequence of length 1 (hence position 0 in M
) was at the index 0: 30
.
此时我们将开始循环 seq
并查看 10
,因为 10
是 <
> 比 30
,M
会更新:
At this point we'll start looping on seq
and look at 10
, since 10
is <
than 30
, M
will be updated:
if j == L or seq[i] < seq[M[j]]:
M[j] = i
所以现在 M
看起来像:[1, None, None, ...]
.这是一件好事,因为 10
有更多的机会来创建更长的递增子序列.(新的1是10的索引)
So now M
looks like: [1, None, None, ...]
. This is a good thing, because 10
have more chanches to create a longer increasing subsequence. (The new 1 is the index of 10)
现在轮到20
.使用 10
和 20
我们有长度为 2 的子序列(M
中的索引 1),所以 M
将是:[1, 2, None, ...]
.(新的2是20的索引)
Now it's the turn of 20
. With 10
and 20
we have subsequence of length 2 (index 1 in M
), so M
will be: [1, 2, None, ...]
. (The new 2 is the index of 20)
现在轮到50
.50
不会成为任何子序列的一部分,因此没有任何变化.
Now it's the turn of 50
. 50
will not be part of any subsequence so nothing changes.
现在轮到40
.对于 10
、20
和 40
我们有一个长度为 3 的子(M
中的索引 2,所以 M
将是: [1, 2, 4, None, ...]
. (新的 4 是 40 的索引)
Now it's the turn of 40
. With 10
, 20
and 40
we have a sub of length 3 (index 2 in M
, so M
will be: [1, 2, 4, None, ...]
. (The new 4 is the index of 40)
等等……
要完整浏览代码,您可以复制并粘贴此处:)
For a complete walk through the code you can copy and paste it here :)
这篇关于最长递增子序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!