最长递增独特子 [英] Longest increasing unique subsequence

查看:110
本文介绍了最长递增独特子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表/阵列看起来是这样的:

  [0 1 2 3 4 5 6 7 3 9 10 11 13 13 14 15 16 17 18 19 4 16 22 5 3
  2 10 17 34 10 11 18 27 14 11 15 29 2 11 10 19 32 8 27 1 32 6 2 0]
 

这名单的应该的单调(严格递增)。 实在不行,但你可以看到它的大多的增加​​。 不适合这个模式中的值可以被认为是噪声, 我希望他们删除。 所以我想提取该名单将在最大可能的子集 是一个严格递增号码的序列。 有许多可能的单调序列这里, 但问题是要找到最大可能之一。

重要的是,我获得要删除的值的索引, 因为我需要知道其余数字的确切位置 而不是删除数(所以我们可以拆换 f.ex. 1 )。

我可以的没有的改变任何数量的订单, 只是删除那些不适合在

剩下的名单必须的严格的增加​​, 所以,如果我们有f.ex. [11 13 13 14] 两个的的13S已经被删除了。

如果有几种可能的解决方案,也同样大, 我们不能使用其中任何一个,必须选择与1号更小的解决方案。 F.ex.在 [27 29 30 34 32] 我们要扔掉两个34和32, 因为我们不能选择一个比其他。 如果我们有 [27 29 34 15 32] 有没有可行的解决方案, 因为我们不能 [27 29] [27 34] [29 34之间选择] [15 32]

最好的可能的解决方案上面psented列表$ P $会是这样的:

  [0 1 2 3 4 5 6 7 -1 9 10 11 -1 -1 14 15 16 17 18 19 -1 -1 -1 22 -1
 -1 -1 -1 -1 -1 -1 -1 27 -1 -1 -1 29 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 

谁能想到一个算法,将做到这一点specifc工作的? 如果你可以把我这句话也将是AP preciated方式的一部分。

我唯一的想法至今是一个循环对于n的范围(N,0,-1): 其中, N 是列表的大小。 该循环将首先设法找出大小的解决方案 n = N的, 然后换 N = N-1 N = N-2 等。 当它发现整整1解决方案specifc N 停止和 返回的解决方案。我不知道什么应该是环内还没有。

更新:

另外一个SO问题提供了一个Python算法寻找最长 序列名单。这几乎是我想做的事情,但并不完全。

我抄了功能(见下文),并增加了一些额外的code在其结束 改变了输出中如果原图= TRUE 。 然后用其原始形状原始序列被重建, 但不是递增序列的一部分的数字将被替换 通过NaN的。然后我检查是否有一些出现不止一次, 如果是的话,与NaN的替换数目的所有出现。

原始算法仍然必须被改变,因为它不提供 独特的解决方案。

例如:

 A = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,32 ,
    18,19,20,16,35,35,33,32,1,35,13,5,32,8,35,29,19,
    35,19,28,32,18,31,13,3,32,33,35,31,0,21]
打印序列(一)
 

  [0 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14。
  15. 16. 32.楠楠楠楠楠楠楠楠楠楠楠楠
  楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠
  楠楠楠楠]
 

而不是用结束了.. 16 32楠.. 应该已经结束, ... 16 ...楠楠31楠楠32 33 35楠楠楠] , 据我所看到的。

简单的例子:

  A = [0,1,2,3,4,1,2,3,4,5]
打印序列(一)
 

  [0 1. 2. 3.楠楠楠楠南5]
 

但它应该只给 [0囡楠... 5] 因为 1 2 3 4 出现两次,而不是唯一的。

这里谈到的code当前半的工作版本 (这是用我的例子运行):

 进口numpy的为NP

高清序列(SEQ,全尺寸= TRUE):
    
    信用:
    http://stackoverflow.com/questions/3992697/longest-increasing-subsequence
    

    M = [无] * LEN(SEQ)#1偏移(的J  - > J-1)
    P = [无] * LEN(SEQ)

    #既然我们已经在我们的列表中至少有一个元素,我们可以开始
    #明知有长度之一的至少一个递增子:
    #第一元件。
    L = 1
    M [0] = 0

    #循环执行从第二元素开始顺序
    因为我在范围内(1,LEN(SEQ)):
        #二进制搜索:我们希望最大的J< = L
        #这样SEQ [M [J〕25;以次[I](默认J = 0),
        #因此我们想下界在检索过程结束。
        低= 0
        上= L

        #由于二进制搜索不会看的上限值,
        #我们必须手动检查
        如果以次[M [上部-1〕25;以次[我]:
            J =上

        其他:
            #实际的二进制搜索循环
            而上 - 下 -  GT; 1:
                中期=(上+下)// 2
                如果以次[M [中期1〕25;以次[我]:
                    低=中
                其他:
                    上=中

            J =下#这也将设置默认值设置为0

        P [I] = M [J-1]

        在j == L或SEQ [1]  - ;以次[M [J]:
            M [J]。= I
            L =最大值(L,J + 1)

    #构建的结果:[SEQ [M [L-1]],以次〔P [M〔L-1]]],以次〔P〔P [M [L-1]]]],...]
    结果= []
    POS = M [L-1]
    为_在范围(L):
        result.append(SEQ [POS])
        POS = P [POS]

    结果= np.array(导致[::  -  1])#逆转

    如果不是全尺寸:
        返回结果#原始的回归从其他等问题。

    #这是我写的,PaulMag:
    #重建原始序列
    subseq = np.zeros(LEN(SEQ))* np.nan
    对于在结果:
        对于我,在历数(SEQ)B:
            如果== B:
                subseq [I] =一
            ELIF B> A:
                打破
        如果np.sum(subseq [np.where(subseq ==一)大小。)> 1:#删除重复。
            subseq [np.where(subseq ==一)] = np.nan

    返回subseq#另类回报由我,PaulMag进行。
 

解决方案

这是一个经典的动态规划问题。

您存储用于每个元素,在该元素结束最大序列的长度。 对于第一个元件的值是1(只取该元素)。对于剩下的你把最大(1,1 +指派给其他previous元素为&lt价值; =则当前元素)。

您可以用2圈(O(N ^ 2))执行。可能有一些优化,如果你的数据是真正的大,你可以做。或者知道你的序列大多好只检查了previous x的元素。

要解决你的数据,你开始分配的最大价值之一(即最长单调序列的长度),你替换-1一切之后再倒退通过列表寻找$ P $的pvious元序列(应该是< =则当前和指定的值应为-1什么是当前元素分配),而你没有找到一个匹配,该元素不属于。当你找到一个匹配你把它作为当前和继续向后,直到你找到你指定1到一个元素(这是第一个)。

I have a list/array that looks something like this:

[ 0  1  2  3  4  5  6  7  3  9 10 11 13 13 14 15 16 17 18 19  4 16 22  5  3   
  2 10 17 34  5 11 18 27 14 11 15 29  2 11 10 19 32  8 27  1 32  6  2  0]

This list is supposed to be monotonic (strictly increasing). It is not, but you can see that it is mostly increasing. The values that does not fit into this pattern can be considered as noise, and I want them removed. So I want to extract the largest possible subset of this list which will be a strictly increasing sequence of numbers. There are many possible monotonic sequences here, but the point is to find the largest possible one.

It is important that I get the indices of the values to be removed, as I need to know the exact position of the remaining numbers (so instead of removing numbers we can replace them with f.ex. None, nan, or -1).

I can not change the order of any number, just remove the ones that does not fit in.

The remaining list has to be strictly increasing, so if we have f.ex. [11 13 13 14], both of the 13s have to be removed.

If there are several possible solutions that are equally large, we cannot use any of them and must choose a solution with 1 number less. F.ex. in [27 29 30 34 32] we have to throw away both 34 and 32, because we cannot choose one over the other. If we have [27 29 34 15 32] there is no possible solution, because we cannot choose between [27 29], [27 34], [29 34], or [15 32].

The best possible solution to the list presented above would be this:

[ 0  1  2  3  4  5  6  7 -1  9 10 11 -1 -1 14 15 16 17 18 19 -1 -1 22 -1 -1   
 -1 -1 -1 -1 -1 -1 -1 27 -1 -1 -1 29 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]

Can anyone think of an algorithm that would do this specifc job? If you can bring me a part on the way that would also be appreciated.

My only idea so far is a loop for n in range(N, 0, -1): where N is the size of the list. The loop would first try to find solutions of size n=N, and then for n=N-1, n=N-2, etc. When it find exactly 1 solution for a specifc n it stops and returns that solution. I'm not sure what should be inside the loop yet.

UPDATE:

Another SO question provides a Python algorithm for finding the longest subsequence of a list. This is almost what I want to do, but not quite.

I have copied that function (see below) and added a little extra code at the end which changed the ouput if fullsize=True. Then the original sequence with its original shape is rebuilt, but the numbers which are not part of the increasing sequence are replaced by nans. And then I check if any number occurs more than once, and if so, replace all occurences of that number with nans.

The original algorithm must still be changed since it does not provide unique solutions.

For example:

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 32, 
    18, 19, 20, 16, 35, 35, 33, 32, 1, 35, 13, 5, 32, 8, 35, 29, 19, 
    35, 19, 28, 32, 18, 31, 13, 3, 32, 33, 35, 31, 0, 21]
print subsequence(a)

gives

[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.
  15.  16.  32.  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan]

Instead of ending with .. 16 32 nan .. it should have ended with ... 16 nan ... nan 31 nan nan 32 33 35 nan nan nan], as far as I can see.

Simpler example:

a = [0,1,2,3,4,1,2,3,4,5]
print subsequence(a)

gives

[  0.   1.   2.   3.  nan  nan  nan  nan  nan   5.]

but it should only have given [0 nan ... nan 5] because 1 2 3 4 appears two times and is not unique.

Here comes the current semi-working version of the code (which was used for my example runs):

import numpy as np

def subsequence(seq, fullsize=True):
    """
    Credit:
    http://stackoverflow.com/questions/3992697/longest-increasing-subsequence
    """

    M = [None] * len(seq)    # offset by 1 (j -> j-1)
    P = [None] * len(seq)

    # Since we have at least one element in our list, we can start by
    # knowing that the there's at least an increasing subsequence of length one:
    # the first element.
    L = 1
    M[0] = 0

    # Looping over the sequence starting from the second element
    for i in range(1, len(seq)):
        # Binary search: we want the largest j <= L
        #  such that seq[M[j]] < seq[i] (default j = 0),
        #  hence we want the lower bound at the end of the search process.
        lower = 0
        upper = L

        # Since the binary search will not look at the upper bound value,
        # we'll have to check that manually
        if seq[M[upper-1]] < seq[i]:
            j = upper

        else:
            # actual binary search loop
            while upper - lower > 1:
                mid = (upper + lower) // 2
                if seq[M[mid-1]] < seq[i]:
                    lower = mid
                else:
                    upper = mid

            j = lower    # this will also set the default value to 0

        P[i] = M[j-1]

        if j == L or seq[i] < seq[M[j]]:
            M[j] = i
            L = max(L, j+1)

    # Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
    result = []
    pos = M[L-1]
    for _ in range(L):
        result.append(seq[pos])
        pos = P[pos]

    result = np.array(result[::-1])    # reversing

    if not fullsize:
        return result  # Original return from other SO question.

    # This was written by me, PaulMag:
    # Rebuild original sequence
    subseq = np.zeros(len(seq)) * np.nan
    for a in result:
        for i, b in enumerate(seq):
            if a == b:
                subseq[i] = a
            elif b > a:
                break
        if np.sum(subseq[np.where(subseq == a)].size) > 1:  # Remove duplicates.
            subseq[np.where(subseq == a)] = np.nan

    return subseq  # Alternative return made by me, PaulMag.

解决方案

It's a classical dynamic programming problem.

You store for every element the length of the largest sequence that ends at that element. For the first element the value is 1 (just take that element). For the rest you take max(1, 1 + the value assigned to some other previous element that is <= then you current element).

You can implement with 2 loops (O(N^2)). There are probably some optimizations you can do if your data is really large. Or knowing your sequence is mostly good only check for the previous X elements.

To fix your data you start with one of the maximum values assigned (that the length of the longest monotonous sequence), you replace with -1 everything after that then go backward through the list looking for the previous element in the sequence (should be <= then the current one and the assigned value should be -1 what the current element is assigned), while you don't find a match, that element doesn't belong. When you find a match you take it as the current and continue backwards until you find an element you've assigned 1 to (that's the first one).

这篇关于最长递增独特子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆