加速顺序检查，如果一个点在Python中的形状 [英] speeding up sequential checking if a point is in a shape in Python

查看：143 发布时间：2018/4/23 17:55:41 python pandas matplotlib geometry vectorization

本文介绍了加速顺序检查，如果一个点在Python中的形状的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个顺序的代码，在我的 DataFrame 中找到的每对笛卡尔坐标是否落入某些几何封闭区域。但它很慢，我怀疑是因为它没有矢量化。下面是一个例子：

  from matplotlib.patches import Rectangle 
 
 r1 = Rectangle（（0， 0），10,10）
 r2 =矩形（（50,50），10,10）
 
 df = pd.DataFrame（[[1,2]，[ -  1，对于范围内的j（df.shape [0]），
 
：
坐标= df，列= ['x'，'y']] .x.iloc [j]，df.y.iloc [j] 
如果r1.contains_point（坐标）：
 df ['location']。iloc [j] = 0 
 else r2.contains_point（坐标）：
 df ['location']。iloc [j] = 1

有人可以提出一种加速方法吗？

解决方案

最好将矩形块转换为数组并工作在它推断出它们展开的范围之后。

  def seqcheck_vect（df）：
 xy = df [[x，y]]。
 e1 = np.asarray（rec1.get_extents（））
 e2 = np.asarray（rec2.get_extents（））
 r1m1 ，r1m2 = np.min（e1），np.max（e 1）
 r2m1，r2m2 = np.min（e2），np.max（e2）
 out = np.where（（（xy> = r1m1）& （xy≤r2m1）&（xy≤r2m2））。all（axis = 1），0，
 np.where（（（xy> = r2m1） ），1，np.nan））
 return df.assign（location = out）

对于给定的样本函数输出：

基准：

 <$ c $ （df.shape [0]）：
 coordinates = df.x.iloc [j]，df.y.iloc [j] $ b def loopy_version（df）：
 $ b如果rec1.contains_point（坐标）：
 df.loc [j，location] = 0 
 elif rec2.contains_point（坐标）：
 df.loc [j，location ] = 1 
 else：
 pass 
 return df

在10K行的 DF 上进行测试：

  np.random.seed（42）
 df = pd.DataFrame（np.random.randint（0，100，（10000,2）） ，列=列表（xy））
 
＃检查是否给出相同的结果
 loopy_version（df）.equals（seqcheck_vect（df））
真
 
％timeit loopy_version（df）
 1循环，最好是3：每循环3.8 s 
 
％timeit seqcheck_vect（df）
 1000循环，最好的3： 1.73 ms per loop

因此，矢量化方法比loopy快近2200倍。 / p>

I have a code for sequentially whether every pair of cartesian coordinates found in my DataFrame fall into certain geometric enclosed areas. But it is rather slow, I suspect because it is not vectorized. Here is an example:

from matplotlib.patches import Rectangle

r1 = Rectangle((0,0), 10, 10)
r2 = Rectangle((50,50), 10, 10)

df = pd.DataFrame([[1,2],[-1,5], [51,52]], columns=['x', 'y'])

for j in range(df.shape[0]):
    coordinates = df.x.iloc[j], df.y.iloc[j]
    if r1.contains_point(coordinates):
        df['location'].iloc[j] = 0
    else r2.contains_point(coordinates):
        df['location'].iloc[j] = 1

Can someone propose an approach for speed-up?

解决方案

It's better to convert the rectangular patches into an array and work on it after deducing the extent to which they are spread out.

def seqcheck_vect(df):
    xy = df[["x", "y"]].values
    e1 = np.asarray(rec1.get_extents())
    e2 = np.asarray(rec2.get_extents())
    r1m1, r1m2 = np.min(e1), np.max(e1)
    r2m1, r2m2 = np.min(e2), np.max(e2)
    out = np.where(((xy >= r1m1) & (xy <= r1m2)).all(axis=1), 0, 
                   np.where(((xy >= r2m1) & (xy <= r2m2)).all(axis=1), 1, np.nan))
    return df.assign(location=out)

For the given sample the function outputs:

benchmarks:
def loopy_version(df): for j in range(df.shape[0]): coordinates = df.x.iloc[j], df.y.iloc[j] if rec1.contains_point(coordinates): df.loc[j, "location"] = 0 elif rec2.contains_point(coordinates): df.loc[j, "location"] = 1 else: pass return df
testing on a DF of 10K rows:
np.random.seed(42) df = pd.DataFrame(np.random.randint(0, 100, (10000,2)), columns=list("xy")) # check if both give same outcome loopy_version(df).equals(seqcheck_vect(df)) True %timeit loopy_version(df) 1 loop, best of 3: 3.8 s per loop %timeit seqcheck_vect(df) 1000 loops, best of 3: 1.73 ms per loop
So, the vectorized approach is approximately 2200 times faster compared to the loopy one.

这篇关于加速顺序检查，如果一个点在Python中的形状的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

加速顺序检查，如果一个点在Python中的形状 [英] speeding up sequential checking if a point is in a shape in Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

加速顺序检查，如果一个点在Python中的形状 [英] speeding up sequential checking if a point is in a shape in Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭