两点层之间的距离矩阵 [英] Distance matrix between two point layers

查看：116 发布时间：2021/5/10 19:16:53 python pandas numpy geopandas shapely

本文介绍了两点层之间的距离矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个数组，其中包含点坐标为shapely.geometry.Point，具有不同的大小.

I have two arrays containing point coordinates as shapely.geometry.Point with different sizes.

例如:

[Point(X Y), Point(X Y)...]
[Point(X Y), Point(X Y)...]

我想用距离函数创建这两个数组的叉积".距离函数来自shape.geometry，这是一种简单的几何矢量距离计算.我正在尝试在M:N点之间创建距离矩阵:

I would like to create a "cross product" of these two arrays with a distance function. Distance function is from shapely.geometry, which is a simple geometry vector distance calculation. I am tryibg to create distance matrix between M:N points:

现在我具有此功能:

    source = gpd.read_file(source)
    near = gpd.read_file(near)

    source_list = source.geometry.values.tolist()
    near_list = near.geometry.values.tolist()

    array = np.empty((len(source.ID_SOURCE), len(near.ID_NEAR)))

    for index_source, item_source in enumerate(source_list):
        for index_near, item_near in enumerate(near_list):
            array[index_source, index_near] = item_source.distance(item_near)

    df_matrix = pd.DataFrame(array, index=source.ID_SOURCE, columns = near.ID_NEAR)

这可以很好地完成工作，但是速度很慢.4000 x 4000点大约是100秒(我的数据集要大得多，所以速度是主要问题).如果可能的话，我想避免这种双重循环.我试图在Pandas数据帧中这样做(速度太快了):

Which does the job fine, but is slow. 4000 x 4000 points is around 100 seconds (I have datasets which are way bigger, so speed is main issue). I would like to avoid this double loop if possible. I tried to do in in pandas dataframe as in (which has terrible speed):

for index_source, item_source in source.iterrows():
         for index_near, item_near in near.iterrows():
             df_matrix.at[index_source, index_near] = item_source.geometry.distance(item_near.geometry)

速度更快(但仍然比numpy慢4倍):

A bit faster is (but still 4x slower than numpy):

    for index_source, item_source in enumerate(source_list):
        for index_near, item_near in enumerate(near_list):
             df_matrix.at[index_source, index_near] = item_source.distance(item_near)

有更快的方法吗?我想有，但我不知道如何进行.我也许可以将数据帧分成较小的块，然后将块发送到不同的内核，然后合并结果-这是不得已的方法.如果我们能以某种方式仅将numpy与仅索引魔术结合使用，我可以将其发送到GPU并立即完成.但是double for循环现在是不可以的.我也想不使用除Pandas/Numpy之外的任何其他库.我可以使用SAGA处理及其点距离"模块( http://www.saga-gis.org/saga_tool_doc/2.2.2/shapes_points_3.html )，该死的速度非常快，但我正在寻找仅适用于Python的解决方案.

Is there a faster way to do this? I guess there is, but I have no idea how to proceed. I might be able to chunk the dataframe into smaller pieces and send the chunk onto different core and concat the results - this is the last resort. If somehow we can use numpy only with some indexing only magic, I can send it to GPU and be done with it in no time. But the double for loop is a no no right now. Also I would like to not use any other library than Pandas/Numpy. I can use SAGA processing and its Point distances module (http://www.saga-gis.org/saga_tool_doc/2.2.2/shapes_points_3.html), which is pretty damn fast, but I am looking for Python only solution.

两点层之间的距离矩阵 [英] Distance matrix between two point layers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

两点层之间的距离矩阵 [英] Distance matrix between two point layers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭