计算到Geopandas到最近要素的距离 [英] Calculate Distance to Nearest Feature with Geopandas

查看:215
本文介绍了计算到Geopandas到最近要素的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻求与ArcPy相同的功能生成近表使用Geopandas/Shapely.我对Geopandas和Shapely并不陌生,并且已经开发出一种可行的方法,但是我想知道是否有更有效的方法.

I'm looking to do the equivalent of the ArcPy Generate Near Table using Geopandas / Shapely. I'm very new to Geopandas and Shapely and have developed a methodology that works, but I'm wondering if there is a more efficient way of doing it.

我有两个点文件数据集-人口普查区块质心和餐厅.我正在寻找针对每个Census Block重心的距它最近的餐厅的距离.对于同一餐厅是多个街区中最近的餐厅,没有任何限制.

I have two point file datasets - Census Block Centroids and restaurants. I'm looking to find, for each Census Block centroid, the distance to it's closest restaurant. There are no restrictions in terms of same restaurant being the closest restaurant for multiple blocks.

这对我来说变得更加复杂的原因是, Geopandas Distance函数根据索引进行元素计算.因此,我的一般方法是将Restaurants文件变成多点文件,然后将块文件的索引都设置为相同的值.然后,所有块质心和餐厅都具有相同的索引值.

The reason this becomes a bit more complicated for me is because the Geopandas Distance function calculates elementwise, matching based on index. Therefore, my general methodology is to turn the Restaurants file into a multipoint file and then set the index of the blocks file to all be the same value. Then all of the block centroids and the restaurants have the same index value.

import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, Point, MultiPoint

现在读取块质心"和餐厅Shapefile"文件:

Now read in the Block Centroid and Restaurant Shapefiles:

Blocks=gpd.read_file(BlockShp)
Restaurants=gpd.read_file(RestaurantShp)

由于Geopandas距离函数按元素计算距离,因此我将Restaurant GeoSeries转换为MultiPoint GeoSeries:

Since the Geopandas distance function calculates distance elementwise, I convert the Restaurant GeoSeries to a MultiPoint GeoSeries:

RestMulti=gpd.GeoSeries(Restaurants.unary_union)
RestMulti.crs=Restaurants.crs
RestMulti.reset_index(drop=True)

然后我将Blocks的索引设置为等于0(与Restaurantss多点相同的值),以解决元素计算.

Then I set the index for the Blocks equal to 0 (the same value as the Restaurants multipoint) as a work around for the elementwise calculation.

Blocks.index=[0]*len(Blocks)

最后,我使用Geopandas距离函数来计算每个Block质心到最近餐厅的距离.

Lastly, I use the Geopandas distance function to calculate the distance to the nearest restaurant for each Block centroid.

Blocks['Distance']=Blocks.distance(RestMulti)

请提供有关如何改进此方面的任何建议.我并不喜欢使用Geopandas或Shapely,但我想学习替代ArcPy的方法.

Please offer any suggestions on how any aspect of this could be improved. I'm not tied to using Geopandas or Shapely, but I am looking to learn an alternative to ArcPy.

感谢您的帮助!

推荐答案

如果我正确理解您的问题,街区和餐厅的尺寸可能会大不相同.因此,尝试通过重新索引强制转换为表格式可能是一种不好的方法.

If I understand correctly your issue, Blocks and Restaurants can have very different dimensions. For this reason, it's probably a bad approach to try to force into a table format by reindexing.

我会绕过街区,到餐厅的最小距离(就像@shongololo所建议的那样).

I would just loop over blocks and get the minimum distance to restaurants (just as @shongololo was suggesting).

我将变得更加笼统(因为我已经写下了这段代码),并且使点到线之间保持一定距离,但是相同的代码应该在点到点或从多边形到多边形之间起作用.我将从点的GeoDataFrame开始,并创建一个到线的最小距离的新列.

I'm going to be slightly more general (because I already have this code written down) and do a distance from points to lines, but the same code should work from points to points or from polygons to polygons. I'll start with a GeoDataFrame for the points and I'll create a new column which has the minimum distance to lines.

%matplotlib inline
import matplotlib.pyplot as plt
import shapely.geometry as geom
import numpy as np
import pandas as pd
import geopandas as gpd

lines = gpd.GeoSeries(
    [geom.LineString(((1.4, 3), (0, 0))),
        geom.LineString(((1.1, 2.), (0.1, 0.4))),
        geom.LineString(((-0.1, 3.), (1, 2.)))])

# 10 points
n  = 10
points = gpd.GeoSeries([geom.Point(x, y) for x, y in np.random.uniform(0, 3, (n, 2))])

# Put the points in a dataframe, with some other random column
df_points = gpd.GeoDataFrame(np.array([points, np.random.randn(n)]).T)
df_points.columns = ['Geometry', 'Property1']

points.plot()
lines.plot()

现在获取点到线的距离,仅保存每个点的最小距离(有关适用的版本,请参见下文)

Now get the distance from points to lines and only save the minimum distance for each point (see below for a version with apply)

min_dist = np.empty(n)
for i, point in enumerate(points):
    min_dist[i] = np.min([point.distance(line) for line in lines])
df_points['min_dist_to_lines'] = min_dist
df_points.head(3)

给出

    Geometry                                       Property1    min_dist_to_lines
0   POINT (0.2479424516236574 2.944916965334865)    2.621823    0.193293
1   POINT (1.465768457667432 2.605673714922998)     0.6074484   0.226353
2   POINT (2.831645235202689 1.125073838462032)     0.657191    1.940127

----编辑----

---- EDIT ----

(摘自github问题)使用apply更好,并且与您在pandas中的使用方式更加一致:

(taken from a github issue) Using apply is nicer and more consistent with how you'd do it in pandas:

def min_distance(point, lines):
    return lines.distance(point).min()

df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, df_lines)

至少从2019-10-04起,熊猫的变化似乎需要使用.apply()中的args参数在最后一个代码块中进行不同的输入:

As of at least 2019-10-04 it seems that a change in pandas requires a different input in the last code block, making use of the args parameters in .apply():

df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, args=(df_lines,))

这篇关于计算到Geopandas到最近要素的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆