确定一个范围是否在另一个范围内 [英] Determine if one range is within another

查看:78
本文介绍了确定一个范围是否在另一个范围内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果有一个文件的范围按第一列排序(范围没有重叠):

If there is a file with ranges sorted by the first column (no overlap of ranges):

1 10
12 15
18 19

另一个,按第一列排序(可能有重叠):

And another, sorted by the first column (can have overlaps):

1 5
2 10
12 13
13 20

我想确定第二个文件中的每一行(范围)是否与第一个文件中的任何范围相交.到目前为止,我做了以下事情

I would like to determine for each line (range)in the second file, if this line(range) intersects with any of the ranges in the first file. I did the following so far

df_1 = pd.read_csv('range1.txt',sep=' ')
df_2 = pd.read_csv('range2.txt',sep=' ')

for i in xrange(len(df_1)):
    start_1 = df_1.iloc[i,0]
    stop_1 = df_1.iloc[i, 1]
    for j in xrange(len(df_2)):
        start_2 = df_2.iloc[j,0]
        stop_2 = df_2.iloc[j, 1]
        if start_2 > stop_1:
            break
        elif stop_2 < start_1:
            continue
        else:
            # add ranges from second file to list

我知道这可能非常低效,所以我想知道是否有一种计算效率更高/更快的方法来解决这个问题.

This I know can be terribly inefficient, so I was wondering if there is a more computationally efficient/faster way to solve this.

推荐答案

@Olivier Pellier-Cuit 提供了一个 快速重叠测试的链接.如果您需要成员资格检查而不是重叠测试,请使用此算法.

@Olivier Pellier-Cuit has provided a link to fast overlap test. If you need membership check instead of overlap test, use this algorithm.

所以使用这个算法我们可以做到以下几点:

So using this algorithm we can do the following:

df1['m'] = (df1.a + df1.b)
df1['d'] = (df1.b - df1.a)

df2['m'] = (df2.a + df2.b)
df2['d'] = (df2.b - df2.a)

df2[['m','d']].apply(lambda x: (np.abs(df1.m - x.m) < df1.d +x.d).any(), axis=1)

PS 我通过去掉 division by 2 稍微简化了 md 的计算,因为它可以消除常用术语.

PS i've slightly simplified the calculations of m and d by getting rid of division by 2, because it can be done eliminating common terms.

输出:

In [105]: df2[['m','d']].apply(lambda x: (np.abs(df1.m - x.m) < df1.d +x.d).any(), axis=1)
Out[105]:
0     True
1     True
2     True
3     True
4    False
dtype: bool

设置:

df1 = pd.read_csv(io.StringIO("""
a b
1 10
12 15
18 19
"""), delim_whitespace=True)

df2 = pd.read_csv(io.StringIO("""
a b
1 5
2 10
12 13
13 20
50 60
"""), delim_whitespace=True)

注意:我特意在 DF2 中添加了一对 (50, 60),它不与 DF1 的任何间隔重叠

NOTE: i've intentionally added a pair (50, 60) to the DF2, which doesn't overlap with any interval from DF1

具有计算的 md 列的数据框:

Data frames with calculated m and d columns:

In [106]: df1
Out[106]:
    a   b   m  d
0   1  10  11  9
1  12  15  27  3
2  18  19  37  1

In [107]: df2
Out[107]:
    a   b    m   d
0   1   5    6   4
1   2  10   12   8
2  12  13   25   1
3  13  20   33   7
4  50  60  110  10

这篇关于确定一个范围是否在另一个范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆