查找范围的最大相交子集 [英] Find the maximally intersecting subset of ranges

查看:302
本文介绍了查找范围的最大相交子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您有一组范围,例如下面的简单示例...

[
    [12, 25], #1
    [14, 27], #2
    [15, 22], #3
    [17, 21], #4
    [20, 65], #5
    [62, 70], #6
    [64, 80]  #7
]

...您如何计算最大相交子集(不确定如何表达它,但我的意思是相交并具有最高基数的范围子集")并确定相交度(该子集中范围的基数)?

从逻辑上讲,我可以解决这个问题,也许可以将其转换为幼稚的算法.在列表的下方,我们看到1-5相交,而5-7相交,#5相交.

我想要的结果只是子集,因为这给了我关于基数的信息,而且只要它们都相交,我就可以轻松地计算出集合的交集.在上面的示例中,它是[[14, 27],[15, 22],[12, 25],[17, 21],[20, 65]].

我可能会尝试将每个范围转换为一个图节点,连接相交的图节点,并找到最大的全连接图.

我还反复考虑要从头开始,继续建立相交范围的列表,并在每个相交范围上运行一个相交的区域以进​​行检查-直到您找到一个不相交的元素,然后开始一个新列表.继续对照现有的交叉路口检查每个项目.但是我不确定这是否完成.

我可以尝试实施某些东西(语言是ruby FWIW),但是我很想听听其他人如何解决此问题,以及最有效,最优雅的方法是什么.

更新:

我认为这是最大派系问题的一个具体案例,这是NP难题,因此实际上很困难.对于近似值/实际使用的建议将不胜感激!

另请参见: http://en.wikipedia.org/wiki/Maximum_clique /查找图中的所有完整子图

更新2

在此处找到了此问题的NP硬度和NP完整性的很好证明: http://www.cs.bris.ac.uk/~popa/ipl.pdf

然后看起来像是这行的结尾.抱歉!我将使用足够好的贪婪近似进行工作.谢谢.

正如答案中所说,我认为这篇论文并没有描述这个问题……我们可能会根据范围在此处提供更多信息.

解决方案

如果我正确理解了该问题,那么它并不是您所链接的论文中描述的NP问题的一个实例.这是我对问题的理解以及多项式时间解.

  1. 我们得到了一组有限的实数范围,例如n:[A1,B1],[A2,B2],...,[An,Bn],其中Ai <= Bi.

  2. 创建起点和终点的排序列表,并按数字排序,指示该点是起点还是终点.

在您的示例中,这将是:12 +,14 +,15 +,17 +,20 +,21-,22-,25-,27-,62 +,64 +,65-,70-, 80-

  1. 将curOverlap和maxOverlap初始化为零.

  2. 遍历列表,为每个+递增curOverlap,为每个-递减.在每个增量上设置maxOverlap = max(curOverlap,maxOverlap).

继续您的示例:
val,cur,max
12,1,1
14,2,2
15,3,3
17,4,4
20、5、5
21、4、5
22、3、5
25、2、5
27,1,5
62、2、5
64、3、5
65,2,5
70,1,5
80,0,5

最大重叠值为5.如果您想知道最大重叠发生在何处,也可以存储与最大关联的val.在此示例中,这将为您提供20.然后遍历初始范围并找到包括20的5个范围是很简单的.

-edit-如果您有重复的值,请在每个值的负号之前计算其正负号,以便包括在单个点处重叠的范围.

If you have a set of ranges, such as the following simple example...

[
    [12, 25], #1
    [14, 27], #2
    [15, 22], #3
    [17, 21], #4
    [20, 65], #5
    [62, 70], #6
    [64, 80]  #7
]

... how do you compute the maximally intersecting subset (not sure quite how to phrase it, but I mean "the subset of ranges which intersects and has the highest cardinality") and determine the degree of intersection (cardinality of ranges in that subset)?

Logically I can work it out, and might be able to translate that to a naive algorithm. Going down the list, we see that 1-5 intersect, and 5-7 intersect, and that #5 intersects both sets.

The result I want is simply the subset, since that gives me the information about the cardinality, and I can easily compute the intersection of the set as long as they all intersect. In the above example, it would be [[14, 27],[15, 22],[12, 25],[17, 21],[20, 65]].

Off the top of my head, I might try converting each range to a graph node, connecting the ones which are intersecting, and finding the largest fully-connected graph.

I was also thinking iteratively to start at the beginning, continue building up a list of intersecting ranges with a running intersection on each to check against—until you hit an element which doesn't intersect, then start a new list. Continue checking each item against the existing intersections. However I'm not sure this is complete.

I could take a stab at implementing something (lang is ruby FWIW), but I would love to hear how others might solve this problem, and what the most efficient and elegant way might be.

Update:

I believe this is a specific case of the Maximum clique problem, which is NP-hard and thus actually difficult. Suggestions for approximations/real-world use would be most appreciated!

See also: http://en.wikipedia.org/wiki/Maximum_clique / Find all complete sub-graphs within a graph

Update 2

Found a nice proof of this problem's NP-hardness and NP-completeness here: http://www.cs.bris.ac.uk/~popa/ipl.pdf

Looks like this is the end of the line then. Sorry folks! I'll work with a good-enough greedy approximation. Thanks.

As said in the answers I don't think that paper describes this problem... we probably have more information here based on the ranges.

解决方案

If I understand the problem correctly, it is not an instance of the NP problem described in the paper you linked to. Here is my understanding of the problem, and a polynomial-time solution.

  1. We are given a finite set of ranges of real numbers, say n: [A1, B1], [A2, B2], ..., [An, Bn], where Ai <= Bi.

  2. Create a sorted list of the starting and ending points, ordered numerically, indicating whether the point is a starting or ending point.

In your example, this would be: 12+, 14+, 15+, 17+, 20+, 21-, 22-, 25-, 27-, 62+, 64+, 65-, 70-, 80-

  1. Initialize curOverlap and maxOverlap to zero.

  2. Iterate through the list, incrementing curOverlap for each + and decrementing it for each -. Set maxOverlap = max(curOverlap, maxOverlap) on each increment.

To continue your example:
val, cur, max
12, 1, 1
14, 2, 2
15, 3, 3
17, 4, 4
20, 5, 5
21, 4, 5
22, 3, 5
25, 2, 5
27, 1, 5
62, 2, 5
64, 3, 5
65, 2, 5
70, 1, 5
80, 0, 5

The max overlap is 5. You could also store the val associated with the max if you wanted to know where the max overlap occurred. That would give you 20. in this example. It's then trivial to go through the initial set of ranges and find the 5 which include 20.

-edit- If you have repeated values, count the plusses before the minuses for each value so that you include ranges that overlap at a single point.

这篇关于查找范围的最大相交子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆