最终重叠指标的迭代解决方案 [英] Iterative Solution to End-Overlapping Indices

查看:116
本文介绍了最终重叠指标的迭代解决方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表,其中包含表示数字范围的元组.我的目标是返回该集合中所有可能的子集(请参阅下面的注释;实际上是寻找最长的子集),这些子集仅与每个元组中的第二个值重叠或完全不重叠.我一直在使用的函数是对此问题的递归解决方案.

I have a list that holds tuples that represent ranges of numbers. My goal is to return all (see the note below; really looking for the longest) possible subsets of this collection that overlap only by the second value in each tuple or not at all. The function I have been using is a recursive solution to this problem.

def get_all_end_overlapping_indices(lst, i, out):
    all_possibilities = []

    def _get_all_end_overlapping_indices_helper(list_in, i, out):
        r = -1
        if i == len(list_in):
            if out:
                if len(all_possibilities) == 0:
                    all_possibilities.append(out)
                else:                       
                    all_possibilities.append(out)

            return 

        n = i + 1

        while n < len(list_in) and r > list_in[n][0]:
            n += 1
        _get_all_end_overlapping_indices_helper(list_in, n, out)
        r = list_in[i][1]

        n = i + 1
        while n < len(list_in) and r > list_in[n][0]:
            n += 1
        _get_all_end_overlapping_indices_helper(list_in, n, out + [list_in[i]])

    _get_all_end_overlapping_indices_helper.count = 0
    lst.sort()
    _get_all_end_overlapping_indices_helper(list_in = lst, i = 0, out = [])
    
    return all_possibilities

我们用lst = [(0.0, 2.0), (0.0, 4.0), (2.5, 4.5), (2.0, 5.75), (2.0, 4.0), (6.0, 7.25), (4.0, 5.5)]

[(6.0, 7.25)]
[(4.0, 5.5)]
[(4.0, 5.5), (6.0, 7.25)]
[(2.5, 4.5)]
[(2.5, 4.5), (6.0, 7.25)]
[(2.0, 5.75)]
[(2.0, 5.75), (6.0, 7.25)]
[(2.0, 4.0)]
[(2.0, 4.0), (6.0, 7.25)]
[(2.0, 4.0), (4.0, 5.5)]
[(2.0, 4.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 4.0)]
[(0.0, 4.0), (6.0, 7.25)]
[(0.0, 4.0), (4.0, 5.5)]
[(0.0, 4.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 2.0)]
[(0.0, 2.0), (6.0, 7.25)]
[(0.0, 2.0), (4.0, 5.5)]
[(0.0, 2.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 2.0), (2.5, 4.5)]
[(0.0, 2.0), (2.5, 4.5), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 5.75)]
[(0.0, 2.0), (2.0, 5.75), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 4.0)]
[(0.0, 2.0), (2.0, 4.0), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 4.0), (4.0, 5.5)]
[(0.0, 2.0), (2.0, 4.0), (4.0, 5.5), (6.0, 7.25)]

由于我最终将要处理较大的元组集合(并且运行速度很慢),因此我想实现一个迭代解决方案.不幸的是,我很困惑.该摘录最初来自:查找所有可能重叠的组合结束并开始.尽管它有效,但我发现破译如何工作还是很棘手的.谁能为您提供一些技巧,说明如何构建此问题的迭代解决方案?

As I will eventually be dealing with larger collections of tuples (and this runs quite slowly), I'd like to implement an iterative solution; unfortunately, I'm stumped. This snippet originally came from: Find all possible combinations that overlap by end and start. Although it works, I find it tricky to decipher how it's working. Could anyone provide some tips on how you might construct an iterative solution to this problem?

注意:我实际上是在寻找最长的输出(请参见下文).我以后总是可以过滤掉较短的那些(即位于最长的那些里面的),但是如果它变得更容易,我很乐意取消它们.

Note: I'm actually looking to get only the longest outputs (see below). I can always filter out the shorter ones (i.e. the ones that sit inside the longest ones) later, but if it makes it easier, I can gladly do away with them.

[(0.0, 2.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 2.0), (2.5, 4.5), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 5.75), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 4.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 4.0), (4.0, 5.5), (6.0, 7.25)]

推荐答案

我们可以通过将其减少到DAG(有向无环图)中最长路径的问题来在多项式时间内解决此问题.

We can solve this problem in polynomial time by reducing it to the problem of the longest path in a DAG (directed acyclic graph).

首先,我们需要将问题建模为DAG.每个元组都代表一个顶点,并且当且仅当b <= c时,我们才构建从(a,b)(c, d)的边.

First, we need to model the problem as a DAG. Each tuple represents a vertex, and we build an edge from (a,b) to (c, d) if and only if b <= c.

然后我们可以看到,(1)通过构造获得的图是非循环的,并且(2)此图中从顶点到另一个顶点的最长路径将表示重叠元组的最长序列.

What we can then see is that (1) the graph obtained is acyclic, by construction and (2) the longest path from a vertex to another in this graph will represent the longest sequence of overlapping tuples.

幸运的是,最长路径问题在DAG中并不困难,通常在一般情况下是NP困难的.在本文档(第4页).
然后,查找元组的最长重叠序列的总体复杂度应为:O(n²)用于构建图,O(n²)用于对顶点进行排序,以及O(n²)用于查找最长的路径,因此O(n²)在最坏的情况下.这比您要使用的递归方法要快得多,因为我们不想枚举所有组合,但是我们只想要最长的组合.

Luckily, the longest path problem, which is NP-hard in the general case, is not hard in a DAG. The problem is described in length in this document (page 4).
The overall complexity to find the longest overlapping sequence of tuples should then be: O(n²) to build the graph, O(n²) to sort vertices, and O(n²) to find the longest path, so O(n²) in the worst case. This is way faster than the recursive approach you were going for, since we don't want to enumerate all combinations, but we want only the longest.

下面是一个python 3代码,它将计算最长的元组序列.在我误解了元组的重叠"关系的情况下,可以很容易地在overlap_condition函数中对其进行修改.

Below is a python 3 code that will compute the longest sequence of tuples. In the case I misunderstood the 'overlap' relation on tuples, it is easily modifiable in the overlap_condition function.

def overlap_condition(tup1, tup2):
    if tup1 == tup2:
        return False
    a, b = tup1
    c, d = tup2
    return b <= c


def adj_mat_from_tup_list(tup_lst):
    return [
        [
            1 if overlap_condition(tup_lst[i], tup_lst[j]) else 0
            for j in range(len(tup_lst))
        ] for i in range(len(tup_lst))

    ]


def topological_sort(adj_mat):
    sorted_v = []
    sinks = {
        i for i in range(len(adj_mat))
        if not any(adj_mat[j][i] == 1 for j in range(len(adj_mat)))
    }

    while sinks:
        v = sinks.pop()
        sorted_v += [v]
        for j in range(len(adj_mat)):
            if adj_mat[v][j] == 1:
                adj_mat[v][j] = 0
                if not any(adj_mat[w][j] for w in range(len(adj_mat))):
                    sinks.add(j)
    return sorted_v


def get_longest_path(adj_mat, sorted_v):
    dists = {v: 0 for v in range(len(adj_mat))}
    preds = {v: None for v in range(len(adj_mat))}
    for v in sorted_v:
        for u in range(len(adj_mat)):
            if adj_mat[u][v]:
                dists[v] = max(dists[v], dists[u] + 1)
                preds[v] = u

    current_v = {
        v for v in range(len(adj_mat))
        if dists[v] == max(dists.values())
    }.pop()
    result = [current_v]
    while preds[current_v] is not None:
        current_v = preds[current_v]
        result += [current_v]
    return result[::-1]


def get_all_end_overlap_tups(tup_lst):
    sorted_v = topological_sort(adj_mat_from_tup_list(tup_lst))
    adj_mat = adj_mat_from_tup_list(tup_lst)
    return [tup_lst[i] for i in get_longest_path(adj_mat, sorted_v)]


lst = [
    (0.0, 2.0), (0.0, 4.0), (2.5, 4.5), (2.0, 5.75),
    (2.0, 4.0), (6.0, 7.25), (4.0, 5.5)
]

print(get_all_end_overlap_tups(lst))

这篇关于最终重叠指标的迭代解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆