给定一组范围S和一个重叠范围R,找到S中包含R的最小子集 [英] Given a set of ranges S, and an overlapping range R, find the smallest subset in S that encompases R

查看:98
本文介绍了给定一组范围S和一个重叠范围R,找到S中包含R的最小子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是某人向我提出的面试问题,我不确定什么是最好的解决方案:

The following is a practice interview question that was given to me by someone, and I'm not sure what the best solution to this is:

给出一组范围:
(例如 S = {(1, 4), (30, 40), (20, 91) ,(8, 10), (6, 7), (3, 9), (9, 12), (11, 14)} .并给出目标范围R(例如 R = (3, 13) -表示范围从3到13).编写算法以找到最小的集合覆盖目标范围的范围.集合中的所有范围必须重叠才能被视为跨越整个目标范围.(在此示例中,答案为 {(3, 9), (9, 12), (11, 14)} .

Given a set of ranges:
(e.g. S = {(1, 4), (30, 40), (20, 91) ,(8, 10), (6, 7), (3, 9), (9, 12), (11, 14)}. And given a target range R (e.g. R = (3, 13) - meaning the range going from 3 to 13). Write an algorithm to find the smallest set of ranges that covers your target range. All of the ranges in the set must overlap in order to be considered as spanning the entire target range. (In this example, the answer would be {(3, 9), (9, 12), (11, 14)}.

解决此问题的最佳方法是什么?我以为这可以使用贪婪算法来完成.在上面的示例中,我们将查找与3相交的所有数字,然后从具有最大max的那些中选择.然后,我们将对刚刚选择的对象执行相同的操作.因此,由于我们选择了(3,9),所以我们现在想找到与9相交的所有范围,在这些范围中,我们选择具有最大max的范围.在该迭代中,我们选择了(9,12).我们对那个对象执行相同的操作,发现与12相交的下一个范围,最大范围为(11,14).

What is the best way to solve this? I was thinking this would be done using a greedy algorithm. In our example above, we would look for all of the numbers that intersect with 3, and pick from those the one with the highest max. Then we would do the same thing with the one we just picked. So, since we picked (3, 9) we now want to find all of the ranges that intersect 9, and among those, we pick the one with the highest max. In that iteration, we picked (9, 12). We do the same thing to that one, and we find that the next range that intersects 12, with the highest max is (11, 14).

在迭代之后,我们看到14大于13(我们的范围的最大值),因此我们可以停止.

After that iteration, we see that 14 is greater than 13 (the max of our range), so we can stop.

该算法存在的问题是,如何有效地查询相交范围?如果尝试线性搜索,最终将得到 O(n^2) 的算法.我的下一个想法是,每次循环运行时,都要从列表中划掉任何相交的范围.因此,在第一次迭代中,我们将(1,4)(3,9)进行交叉.在我们的下一个迭代中,我们交叉选择(9,12)(3,9)(8,10).因此,在最后一次迭代中,我们要做的只是查看 {(30,40),(20,91),(6,7)} .我们也可以通过消除最小值> 13和最大值<<的所有内容来提高效率. 3.问题是,这可能还不够.在我们的范围内,仍然存在许多重复序列的潜在问题.如果我们的范围列表包含 {(6,7),(6,7),(6、7),(6、7),(6、7)} 每次浏览这些内容,即使它们对我们没有用.即使我们只存储唯一值(通过将它们全部放在一个集合中),我们也可能有很大的范围,一堆范围都在目标范围之内,但是我们也有一个范围几乎跨度的范围整个目标范围.

The problem I'm having with this algorithm is, how do efficiently query the intersecting ranges? If we try a linear search, we end up with an algorithm that is O(n^2). My next thought was to cross off any of our intersecting ranges from our list each time we run through the loop. So in the first iteration, we cross of (1, 4) and (3, 9). In our next iteration we cross of (9, 12), (3, 9), and (8, 10). So by the last iteration, all we have to look through is {(30, 40), (20, 91), (6, 7)}. We could make this even more efficient by also crossing out everything that has a min > 13, and a max < 3. The problem is this still might not be enough. There is still the potential problem of having lots of duplicate sequences within the bounds of our range. If our list of ranges contained something like {(6, 7), (6, 7), (6, 7), (6, 7), (6, 7)} we would have to look through those each time, even though they aren't useful to us. Even if we were only to store unique values (by putting them all in a set), we might have a really big range, with a bunch of ranges that are inside of our target range, but we also have one range inside that spans almost the entire target range.

什么是查询我们范围的有效方法?或者可能是解决该问题的更有效的算法是什么?

推荐答案

如何使用间隔树进行查询? ( https://en.m.wikipedia.org/wiki/Interval_tree )我是我不确定贪婪是否可以在这里工作.如果我们看最后一组选择,与R中的最高点重叠,则每个选择的较早选择之间可能存在重叠,例如:

How about using an interval tree for queries? (https://en.m.wikipedia.org/wiki/Interval_tree) I'm not sure if greedy could work here or not. If we look at the last set of choices, overlapping with the high point in R, there's a possibility of overlap between the earlier choices for each one of those, for example:

R = (2,10) and we have (8,10) and (7,10) both overlapping with (6,8)

在这种情况下,我们只需要为(6,8)存储一个值作为路径的第二行;因为我们已经知道(6,8)的访问次数较少,所以当我们沿R的低点走更长的路径时再次访问(6,8)是多余的.因此,您的想法就是消除间隔.这样的事情行得通吗?

In that case, we only need to store one value for (6,8) as a second leg of the path; and visiting (6,8) again as we make longer paths towards the low point in R would be superfluous since we already know (6,8) was visited with a lower leg count. So your idea of eliminating intervals as we go makes sense. Could something like this work?

leg = 1
start with the possible end (or beginning) intervals
label these intervals with leg
until end of path is reached:
  remove the intervals labeled leg from the tree
  for each of those intervals labeled leg:
    list overlapping intervals in the chosen direction
  leg = leg + 1
  label the listed overlapping intervals with leg

这篇关于给定一组范围S和一个重叠范围R,找到S中包含R的最小子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆