Python的 - 删除重叠名单 [英] Python - Removing overlapping lists
问题描述
说我有名单,有索引列表 [开始,结束],[启动1,END1],[START2,END2]]
。
Say I have a list of lists that has indexes [[start, end], [start1, end1], [start2, end2]]
.
例如像:
[0,133],[78,100],[25,30]
。
我该如何获得检查重叠的名单中,并删除列表的长度较长的每一次? 所以:
How would I get check for overlap between among the lists and remove the list with the longer length each time? So:
>>> list = [[0, 133], [78, 100], [25, 30]]
>>> foo(list)
[[78, 100], [25, 30]]
这就是我想这样做的远:
This is what I tried to do so far:
def cleanup_list(list):
i = 0
c = 0
x = list[:]
end = len(x)
while i < end-1:
for n in range(x[i][0], x[i][1]):
if n in range(x[i+1][0], x[i+1][1]):
list.remove(max(x[i], x[i+1]))
i +=1
return list
不过,除了是一种令人费解它不能正常工作的:
But in addition to being kind of convoluted it's not working properly:
>>>cleanup_list([[0,100],[9,10],[12,90]])
[[0, 100], [12, 90]]
任何帮助将是AP preciated!
Any help would be appreciated!
编辑:
其它例子将是:
>>>a = [[0, 100], [4, 20], [30, 35], [30, 78]]
>>>foo(a)
[[4, 20], [30, 35]]
>>>b = [[30, 70], [25, 40]]
>>>foo(b)
[[25, 40]]
基本上,我想删除所有与另一个表重叠最长的名单。在这种情况下,我不必担心名单是等长的。
I'm basically trying to remove all of the longest lists that overlap with another list. In this case I don't have to worry about the lists being of equal length.
谢谢!
推荐答案
从列表中删除间隔最小数量,使得那剩下的时间间隔不重叠, O(N *日志N )
算法存在:
To remove a minimal number of intervals from the list such that the intervals that are left do not overlap, O(n*log n)
algorithm exists:
def maximize_nonoverlapping_count(intervals):
# sort by the end-point
L = sorted(intervals, key=lambda (start, end): (end, (end - start)),
reverse=True) # O(n*logn)
iv = build_interval_tree(intervals) # O(n*log n)
result = []
while L: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(L.pop()) # O(1)
# remove intervals that overlap with the popped interval
overlapping_intervals = iv.pop(result[-1]) # O(log n + m)
remove(overlapping_intervals, from_=L)
return result
它应该产生以下结果:
It should produce the following results:
f = maximize_nonoverlapping_count
assert f([[0, 133], [78, 100], [25, 30]]) == [[25, 30], [78, 100]]
assert f([[0,100],[9,10],[12,90]]) == [[9,10], [12, 90]]
assert f([[0, 100], [4, 20], [30, 35], [30, 78]]) == [[4, 20], [30, 35]]
assert f([[30, 70], [25, 40]]) == [[25, 40]]
它需要的数据结构,可以在找到为O(log N + M)
时间与给定的时间间隔,例如, IntervalTree
。有可以在Python如<一个用于实现href="http://bitbucket.org/james_taylor/bx-python/raw/ebf9a4b352d3/lib/bx/intervals/operations/quicksect.py"><$c$c>quicksect.py$c$c>,看到快速区间交集方法中的用法示例。
It requires the data structure that can find in O(log n + m)
time all intervals that overlap with the given interval e.g., IntervalTree
. There are implementations that can be used from Python e.g., quicksect.py
, see Fast interval intersection methodologies for the example usage.
下面是一个 quicksect
基于为O(n ** 2)
执行上面的算法:
Here's a quicksect
-based O(n**2)
implementation of the above algorithm:
from quicksect import IntervalNode
class Interval(object):
def __init__(self, start, end):
self.start = start
self.end = end
self.removed = False
def maximize_nonoverlapping_count(intervals):
intervals = [Interval(start, end) for start, end in intervals]
# sort by the end-point
intervals.sort(key=lambda x: (x.end, (x.end - x.start))) # O(n*log n)
tree = build_interval_tree(intervals) # O(n*log n)
result = []
for smallest in intervals: # O(n) (without the loop body)
# pop the interval with the smallest end-point, keep it in the result
if smallest.removed:
continue # skip removed nodes
smallest.removed = True
result.append([smallest.start, smallest.end]) # O(1)
# remove (mark) intervals that overlap with the popped interval
tree.intersect(smallest.start, smallest.end, # O(log n + m)
lambda x: setattr(x.other, 'removed', True))
return result
def build_interval_tree(intervals):
root = IntervalNode(intervals[0].start, intervals[0].end,
other=intervals[0])
return reduce(lambda tree, x: tree.insert(x.start, x.end, other=x),
intervals[1:], root)
请注意:在最坏情况下的时间复杂度为为O(n ** 2)
此实现,因为时间间隔仅标记为删除例如,想象一下这样的输入的间隔
的 len个(结果)== LEN(间隔)/ 3
并有 LEN(间隔)/ 2
的间隔,涵盖了整个范围,则 tree.intersect()
将被称为 N / 3
次,每次通话将执行 x.other.removed = TRUE
至少 N / 2
倍例如, N * N / 6
总操作:
Note: the time complexity in the worst case is O(n**2)
for this implementation because the intervals are only marked as removed e.g., imagine such input intervals
that len(result) == len(intervals) / 3
and there were len(intervals) / 2
intervals that span the whole range then tree.intersect()
would be called n/3
times and each call would execute x.other.removed = True
at least n/2
times i.e., n*n/6
operations in total:
n = 6
intervals = [[0, 100], [0, 100], [0, 100], [0, 10], [10, 20], [15, 40]])
result = [[0, 10], [10, 20]]
下面是一个 榕树
基于为O(n log n)的
实施
Here's a banyan
-based O(n log n)
implementation:
from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan
def maximize_nonoverlapping_count(intervals):
# sort by the end-point O(n log n)
sorted_intervals = SortedSet(intervals,
key=lambda (start, end): (end, (end - start)))
# build "interval" tree O(n log n)
tree = SortedSet(intervals, updator=OverlappingIntervalsUpdator)
result = []
while sorted_intervals: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(sorted_intervals.pop()) # O(log n)
# remove intervals that overlap with the popped interval
overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
tree -= overlapping_intervals # O(m log n)
sorted_intervals -= overlapping_intervals # O(m log n)
return result
注:此实现认为 [0,10]
和 [10,20]
的间隔是重叠的:
Note: this implementation considers [0, 10]
and [10, 20]
intervals to be overlapping:
f = maximize_nonoverlapping_count
assert f([[0, 100], [0, 10], [11, 20], [15, 40]]) == [[0, 10] ,[11, 20]]
assert f([[0, 100], [0, 10], [10, 20], [15, 40]]) == [[0, 10] ,[15, 40]]
sorted_intervals
和树
可以合并:
from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan
def maximize_nonoverlapping_count(intervals):
# build "interval" tree sorted by the end-point O(n log n)
tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)),
updator=OverlappingIntervalsUpdator)
result = []
while tree: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(tree.pop()) # O(log n)
# remove intervals that overlap with the popped interval
overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
tree -= overlapping_intervals # O(m log n)
return result
这篇关于Python的 - 删除重叠名单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!