找到范围的最大相交子集 [英] Find the maximally intersecting subset of ranges

查看:29
本文介绍了找到范围的最大相交子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果你有一组范围,比如下面这个简单的例子……

If you have a set of ranges, such as the following simple example...

[
    [12, 25], #1
    [14, 27], #2
    [15, 22], #3
    [17, 21], #4
    [20, 65], #5
    [62, 70], #6
    [64, 80]  #7
]

...您如何计算 最大相交子集(不太清楚如何表达它,但我的意思是相交并具有最高基数的范围的子集")并确定交叉度(该子集中范围的基数)?

... how do you compute the maximally intersecting subset (not sure quite how to phrase it, but I mean "the subset of ranges which intersects and has the highest cardinality") and determine the degree of intersection (cardinality of ranges in that subset)?

从逻辑上讲,我可以解决它,并且可以将其转换为一个简单的算法.从列表往下看,我们看到 1-5 相交和 5-7 相交,并且 #5 与这两个集合相交.

Logically I can work it out, and might be able to translate that to a naive algorithm. Going down the list, we see that 1-5 intersect, and 5-7 intersect, and that #5 intersects both sets.

我想要的结果只是子集,因为这给了我有关基数的信息,只要它们都相交,我就可以轻松计算集合的交集.在上面的例子中,它将是 [[14, 27],[15, 22],[12, 25],[17, 21],[20, 65]].

The result I want is simply the subset, since that gives me the information about the cardinality, and I can easily compute the intersection of the set as long as they all intersect. In the above example, it would be [[14, 27],[15, 22],[12, 25],[17, 21],[20, 65]].

在我的脑海中,我可能会尝试将每个范围转换为一个图节点,连接相交的节点,并找到最大的全连接图.

Off the top of my head, I might try converting each range to a graph node, connecting the ones which are intersecting, and finding the largest fully-connected graph.

我也在反复考虑从一开始就开始,继续建立一个相交范围的列表,每个范围都有一个运行的交点来检查——直到你找到一个不相交的元素,然后开始一个新的列表.继续根据现有交叉点检查每个项目.但是我不确定这是否完整.

I was also thinking iteratively to start at the beginning, continue building up a list of intersecting ranges with a running intersection on each to check against—until you hit an element which doesn't intersect, then start a new list. Continue checking each item against the existing intersections. However I'm not sure this is complete.

我可以尝试实现一些东西(lang 是 ruby​​ FWIW),但我很想听听其他人如何解决这个问题,以及最有效和最优雅的方法可能是什么.

I could take a stab at implementing something (lang is ruby FWIW), but I would love to hear how others might solve this problem, and what the most efficient and elegant way might be.

更新:

我相信这是最大集团问题的一个特殊情况,它是 NP 难的,因此实际上很困难.对于近似值/实际使用的建议将不胜感激!

I believe this is a specific case of the Maximum clique problem, which is NP-hard and thus actually difficult. Suggestions for approximations/real-world use would be most appreciated!

参见:http://en.wikipedia.org/wiki/Maximum_clique/查找图中的所有完整子图

更新 2

在这里找到了这个问题的 NP-hardness 和 NP-completeness 的一个很好的证明:http://www.cs.bris.ac.uk/~popa/ipl.pdf

看起来这是该行的结尾.对不起各位!我将使用足够好的贪婪近似值.谢谢.

Looks like this is the end of the line then. Sorry folks! I'll work with a good-enough greedy approximation. Thanks.

正如答案中所说,我认为那篇论文没有描述这个问题......我们可能会根据范围在此处获得更多信息.

As said in the answers I don't think that paper describes this problem... we probably have more information here based on the ranges.

推荐答案

如果我正确理解了这个问题,它不是您链接到的论文中描述的 NP 问题的一个实例.这是我对问题的理解,以及多项式时间的解决方案.

If I understand the problem correctly, it is not an instance of the NP problem described in the paper you linked to. Here is my understanding of the problem, and a polynomial-time solution.

  1. 给定一组有限的实数范围,例如 n:[A1, B1],[A2, B2], ..., [An, Bn],其中 Ai ≤ Bi.

  1. We are given a finite set of ranges of real numbers, say n: [A1, B1], [A2, B2], ..., [An, Bn], where Ai ≤ Bi.

创建起点和终点的排序列表,按数字排序,指示该点是起点还是终点.

Create a sorted list of the starting and ending points, ordered numerically, indicating whether the point is a starting or ending point.

在您的示例中,这将是:12+、14+、15+、17+、20+、21-、22-、25-、27-、62+、64+、65-、70-、80-

In your example, this would be: 12+, 14+, 15+, 17+, 20+, 21-, 22-, 25-, 27-, 62+, 64+, 65-, 70-, 80-

  1. curOverlapmaxOverlap 初始化为零.

遍历列表,为每个 + 递增 curOverlap 并为每个 - 递减它.在每个增量上设置 maxOverlap = max(curOverlap, maxOverlap).

Iterate through the list, incrementing curOverlap for each + and decrementing it for each -. Set maxOverlap = max(curOverlap, maxOverlap) on each increment.

继续你的例子:
val, cur, 最大值
12、1、1
14、2、2
15、3、3
17、4、4
20、5、5
21、4、5
22、3、5
25、2、5
27、1、5
62、2、5
64、3、5
65、2、5
70, 1, 5
80, 0, 5

To continue your example:
val, cur, max
12, 1, 1
14, 2, 2
15, 3, 3
17, 4, 4
20, 5, 5
21, 4, 5
22, 3, 5
25, 2, 5
27, 1, 5
62, 2, 5
64, 3, 5
65, 2, 5
70, 1, 5
80, 0, 5

最大重叠为 5.如果您想知道最大重叠发生的位置,也可以存储与最大值关联的 val.在这个例子中,这会给你 20.然后通过初始范围集并找到包括 20 的 5 个范围是微不足道的.

The max overlap is 5. You could also store the val associated with the max if you wanted to know where the max overlap occurred. That would give you 20. in this example. It's then trivial to go through the initial set of ranges and find the 5 which include 20.

-edit- 如果您有重复的值,请在每个值的减号之前计算加号,以便包含在单个点重叠的范围.

-edit- If you have repeated values, count the plusses before the minuses for each value so that you include ranges that overlap at a single point.

这篇关于找到范围的最大相交子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆