给定一组范围 S 和一个重叠范围 R,找到 S 中包含 R 的最小子集 [英] Given a set of ranges S, and an overlapping range R, find the smallest subset in S that encompases R

查看:16
本文介绍了给定一组范围 S 和一个重叠范围 R,找到 S 中包含 R 的最小子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是某人给我的练习面试问题,我不确定最好的解决方案是什么:

The following is a practice interview question that was given to me by someone, and I'm not sure what the best solution to this is:

给定一组范围:
(例如 S = {(1, 4), (30, 40), (20, 91) ,(8, 10), (6, 7), (3, 9), (9,12), (11, 14)}. 并给定一个目标范围 R(例如 R = (3, 13) - 表示范围从 3 到 13).编写一个算法来找到覆盖目标范围的最小范围集.集合中的所有范围必须重叠才能被视为跨越整个目标范围.(在此示例中,答案是 {(3, 9), (9, 12), (11, 14)}.

Given a set of ranges:
(e.g. S = {(1, 4), (30, 40), (20, 91) ,(8, 10), (6, 7), (3, 9), (9, 12), (11, 14)}. And given a target range R (e.g. R = (3, 13) - meaning the range going from 3 to 13). Write an algorithm to find the smallest set of ranges that covers your target range. All of the ranges in the set must overlap in order to be considered as spanning the entire target range. (In this example, the answer would be {(3, 9), (9, 12), (11, 14)}.

解决此问题的最佳方法是什么?我在想这将使用贪心算法来完成.在上面的示例中,我们将查找与 3 相交的所有数字,并从具有最高最大值的数字中选择.然后我们会对我们刚刚选择的那个做同样的事情.因此,既然我们选择了 (3, 9),我们现在想要找到与 9 相交的所有范围,并且在这些范围中,我们选择具有最高最大值的范围.在该迭代中,我们选择了 (9, 12).我们对那个做同样的事情,我们发现下一个与 12 相交的范围是 (11, 14).

What is the best way to solve this? I was thinking this would be done using a greedy algorithm. In our example above, we would look for all of the numbers that intersect with 3, and pick from those the one with the highest max. Then we would do the same thing with the one we just picked. So, since we picked (3, 9) we now want to find all of the ranges that intersect 9, and among those, we pick the one with the highest max. In that iteration, we picked (9, 12). We do the same thing to that one, and we find that the next range that intersects 12, with the highest max is (11, 14).

在那次迭代之后,我们看到 14 大于 13(我们范围的最大值),所以我们可以停止.

After that iteration, we see that 14 is greater than 13 (the max of our range), so we can stop.

我在使用这个算法时遇到的问题是,如何有效地查询相交范围?如果我们尝试线性搜索,我们最终会得到 O(n^2) 的算法.我的下一个想法是每次运行循环时从列表中划掉任何相交范围.所以在第一次迭代中,我们交叉了 (1, 4)(3, 9).在我们的下一次迭代中,我们将跨越 (9, 12)(3, 9)(8, 10).所以到最后一次迭代,我们只需要查看 {(30, 40), (20, 91), (6, 7)}.我们可以通过删除所有 min > 13 和 max <的所有内容来提高效率.3. 问题是这可能还不够.在我们的范围内仍然存在大量重复序列的潜在问题.如果我们的范围列表包含类似 {(6, 7), (6, 7), (6, 7), (6, 7), (6, 7)} 的内容,我们将不得不每次都看一遍,即使它们对我们没有用.即使我们只存储唯一值(通过将它们全部放在一个集合中),我们也可能有一个非常大的范围,有一堆范围在我们的目标范围内,但我们也有一个范围几乎跨越整个目标范围.

The problem I'm having with this algorithm is, how do efficiently query the intersecting ranges? If we try a linear search, we end up with an algorithm that is O(n^2). My next thought was to cross off any of our intersecting ranges from our list each time we run through the loop. So in the first iteration, we cross of (1, 4) and (3, 9). In our next iteration we cross of (9, 12), (3, 9), and (8, 10). So by the last iteration, all we have to look through is {(30, 40), (20, 91), (6, 7)}. We could make this even more efficient by also crossing out everything that has a min > 13, and a max < 3. The problem is this still might not be enough. There is still the potential problem of having lots of duplicate sequences within the bounds of our range. If our list of ranges contained something like {(6, 7), (6, 7), (6, 7), (6, 7), (6, 7)} we would have to look through those each time, even though they aren't useful to us. Even if we were only to store unique values (by putting them all in a set), we might have a really big range, with a bunch of ranges that are inside of our target range, but we also have one range inside that spans almost the entire target range.

查询我们的范围的有效方法是什么?或者可能,解决这个问题的更有效算法是什么?

推荐答案

使用区间树进行查询怎么样?(https://en.m.wikipedia.org/wiki/Interval_tree)我'我不确定贪婪是否可以在这里工作.如果我们查看最后一组选项,与 R 中的高点重叠,则每个选项的早期选项之间都有可能重叠,例如:

How about using an interval tree for queries? (https://en.m.wikipedia.org/wiki/Interval_tree) I'm not sure if greedy could work here or not. If we look at the last set of choices, overlapping with the high point in R, there's a possibility of overlap between the earlier choices for each one of those, for example:

R = (2,10) and we have (8,10) and (7,10) both overlapping with (6,8)

在这种情况下,我们只需要为 (6,8) 存储一个值作为路径的第二段;再次访问 (6,8) 因为我们已经知道 (6,8) 被访问了一个小腿计数.因此,您在进行过程中消除间隔的想法是有道理的.像这样的东西可以吗?

In that case, we only need to store one value for (6,8) as a second leg of the path; and visiting (6,8) again as we make longer paths towards the low point in R would be superfluous since we already know (6,8) was visited with a lower leg count. So your idea of eliminating intervals as we go makes sense. Could something like this work?

leg = 1
start with the possible end (or beginning) intervals
label these intervals with leg
until end of path is reached:
  remove the intervals labeled leg from the tree
  for each of those intervals labeled leg:
    list overlapping intervals in the chosen direction
  leg = leg + 1
  label the listed overlapping intervals with leg

这篇关于给定一组范围 S 和一个重叠范围 R,找到 S 中包含 R 的最小子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆