几个区间范围的快速交集? [英] Fast intersection of several interval ranges?

查看:685
本文介绍了几个区间范围的快速交集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个变量,所有变量都是数字范围:(行的间隔)

I have several variables, all of which are numeric ranges: (intervals in rows)

a = [ 1 4; 5 9; 11 15; 20 30];
b = [ 2 6; 12 14; 19 22];
c = [ 15 22; 24 29; 33 35];
d = [ 0 3; 15 17; 23 26];

(我的真实数据集中的值不是整数,但为清楚起见在此表示).

(The values in my real dataset are not integers, but are represented here as such for clarity).

我想找到至少三个变量相交的区间.在上面的示例中,[20 22]和[24 26]将是两个这样的情况.

I would like to find intervals in which at least 3 of the variables intersect. In the above example [20 22] and [24 26] would be two such cases.

一种实现此目的的方法是将我的值进行装箱并将这些箱加在一起,但是由于我的值是连续的,因此会产生边缘效应",而我一开始就浪费将这些值装箱的时间. (将数据集以所需的分辨率进行绑定会创建数百GB的数据).

One way to approach this is to bin my values and add the bins together, but as my values are continuous, that'd create an 'edge effect', and I'd lose time binning the values in the first place. (binning my dataset at my desired resolution would create hundreds of GB of data).

另一种不涉及分箱的方法将在所有可能的变量组合之间使用成对相交(我们称其为X),然后在X与所有其他变量O(n ^ 3)的交集上使用.

Another approach, which doesn't involve binning, would use pairwise intersections (let's call it X) between all possible combinations of variables, and then the intersection of X with all other variables O(n^3).

您对此有何看法?有没有可以解决此问题的工具的算法/库?

What are your thoughts on this? Are there algorithms/libraries which have tools to solve this?

我当时正在考虑使用某种几何方法来解决此问题:基本上,如果我认为我的区间是一维空间中的线段,那么我想要的输出将是三个线段(来自三个变量)相交的点.我不确定这在算法上是否有效.咨询吗?

I was thinking of using sort of a geometric approach to solve this: Basically, if I considered that my intervals were segments in 1D space, then my desired output would be points where three segments (from three variables) intersect. I'm not sure if this is efficient algorithmically though. Advice?

推荐答案

O(N lg N)方法:

O(N lg N) method:

  1. 将每个时间间隔(t_A,t_B)转换为一对标记的端点('begin',t_A),('end',t_B)

  1. Convert each interval (t_A, t_B) to a pair of tagged endpoints ('begin', t_A), ('end', t_B)

按时间排序所有端点,这是最昂贵的步骤

Sort all the endpoints by time, this is the most expensive step

进行一次遍历,跟踪嵌套深度(如果标签为开始",则递增;如果标签为结束",则递减).这需要线性时间.

Do one pass through, tracking nesting depth (increment if tag is 'start', decrement if tag is 'end'). This takes linear time.

  • 当深度从2更改为3时,这是输出间隔的开始.
  • 当它从3变为2时,它是一个间隔的结束.

这篇关于几个区间范围的快速交集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆