加速顺序检查,如果一个点在Python中的形状 [英] speeding up sequential checking if a point is in a shape in Python

查看:143
本文介绍了加速顺序检查,如果一个点在Python中的形状的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个顺序的代码,在我的 DataFrame 中找到的每对笛卡尔坐标是否落入某些几何封闭区域。但它很慢,我怀疑是因为它没有矢量化。下面是一个例子:

  from matplotlib.patches import Rectangle 

r1 = Rectangle((0, 0),10,10)
r2 =矩形((50,50),10,10)

df = pd.DataFrame([[1,2],[ - 1,对于范围内的j(df.shape [0]),


坐标= df,列= ['x','y']] .x.iloc [j],df.y.iloc [j]
如果r1.contains_point(坐标):
df ['location']。iloc [j] = 0
else r2.contains_point(坐标):
df ['location']。iloc [j] = 1



有人可以提出一种加速方法吗?

解决方案

最好将矩形块转换为数组并工作在它推断出它们展开的范围之后。

  def seqcheck_vect(df):
xy = df [[x,y]]。
e1 = np.asarray(rec1.get_extents())
e2 = np.asarray(rec2.get_extents())
r1m1 ,r1m2 = np.min(e1),np.max(e 1)
r2m1,r2m2 = np.min(e2),np.max(e2)
out = np.where(((xy> = r1m1)& (xy≤r2m1)&(xy≤r2m2))。all(axis = 1),0,
np.where(((xy> = r2m1) ),1,np.nan))
return df.assign(location = out)

对于给定的样本函数输出:








基准:

 <$ c $ (df.shape [0]):
coordinates = df.x.iloc [j],df.y.iloc [j] $ b def loopy_version(df):
$ b如果rec1.contains_point(坐标):
df.loc [j,location] = 0
elif rec2.contains_point(坐标):
df.loc [j,location ] = 1
else:
pass
return df

在10K行的 DF 上进行测试:

  np.random.seed(42)
df = pd.DataFrame(np.random.randint(0,100,(10000,2)) ,列=列表(xy))

#检查是否给出相同的结果
loopy_version(df).equals(seqcheck_vect(df))


%timeit loopy_version(df)
1循环,最好是3:每循环3.8 s

%timeit seqcheck_vect(df)
1000循环,最好的3: 1.73 ms per loop

因此,矢量化方法比loopy快近2200倍。 / p>

I have a code for sequentially whether every pair of cartesian coordinates found in my DataFrame fall into certain geometric enclosed areas. But it is rather slow, I suspect because it is not vectorized. Here is an example:

from matplotlib.patches import Rectangle

r1 = Rectangle((0,0), 10, 10)
r2 = Rectangle((50,50), 10, 10)

df = pd.DataFrame([[1,2],[-1,5], [51,52]], columns=['x', 'y'])

for j in range(df.shape[0]):
    coordinates = df.x.iloc[j], df.y.iloc[j]
    if r1.contains_point(coordinates):
        df['location'].iloc[j] = 0
    else r2.contains_point(coordinates):
        df['location'].iloc[j] = 1

Can someone propose an approach for speed-up?

解决方案

It's better to convert the rectangular patches into an array and work on it after deducing the extent to which they are spread out.

def seqcheck_vect(df):
    xy = df[["x", "y"]].values
    e1 = np.asarray(rec1.get_extents())
    e2 = np.asarray(rec2.get_extents())
    r1m1, r1m2 = np.min(e1), np.max(e1)
    r2m1, r2m2 = np.min(e2), np.max(e2)
    out = np.where(((xy >= r1m1) & (xy <= r1m2)).all(axis=1), 0, 
                   np.where(((xy >= r2m1) & (xy <= r2m2)).all(axis=1), 1, np.nan))
    return df.assign(location=out)

For the given sample the function outputs:


benchmarks:

def loopy_version(df):
    for j in range(df.shape[0]):
        coordinates = df.x.iloc[j], df.y.iloc[j]
        if rec1.contains_point(coordinates):
            df.loc[j, "location"] = 0
        elif rec2.contains_point(coordinates):
            df.loc[j, "location"] = 1
        else:
            pass
    return df

testing on a DF of 10K rows:

np.random.seed(42)
df  = pd.DataFrame(np.random.randint(0, 100, (10000,2)), columns=list("xy"))

# check if both give same outcome
loopy_version(df).equals(seqcheck_vect(df))
True

%timeit loopy_version(df)
1 loop, best of 3: 3.8 s per loop

%timeit seqcheck_vect(df)
1000 loops, best of 3: 1.73 ms per loop

So, the vectorized approach is approximately 2200 times faster compared to the loopy one.

这篇关于加速顺序检查,如果一个点在Python中的形状的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆