如何在 pandas 数据框中选择地理区域内的对象 [英] How do i select objects within a geographic regions in a pandas dataframe

查看:96
本文介绍了如何在 pandas 数据框中选择地理区域内的对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从pandas数据框中选择区域内的对象,该数据框中包含项目ID和纬度对的列表.是否有选择方法? 我认为这类似于该SO问题,但使用PANDAS而不是SQL

I'm trying to selection objects within a region from a pandas dataframe which contains a list of item ids and lat lon pairs. Is there a selection method for this? I think this would be similar to this SO question but using PANDAS instead of SQL

在区域内选择地理位置

这是我的表格保存在 locations.csv

ID, LAT, LON
001,35.00,-75.00
002,35.01,-80.00 
...
999,25.76,-64.00

我可以加载数据框,然后选择一个矩形区域:

I can load the dataframe, and select a rectangular region:

import pandas as pd
df = pd.read_csv('locations.csv', delimiter=',')
lat_max = 32.323496
lat_min = 25.712767
lon_max = -72.863358
lon_min = -74.729456
small_df = df[df['LAT'] > lat_min][df['LAT'] < lat_max][df['LON'] < lon_max][df['LON'] > lon_min]

如何在不规则区域内选择对象?

How would I select objects within an irregular region?

如何构造数据框选择命令?

How would I structure the dataframe selection command?

我可以构建一个lambda函数,该函数将在该区域内为LAT和LON生成True值,但是我不确定如何将其与pandas数据框一起使用.

I can build a lambda function that will produce a True value for LAT and LON within the region but I'm not sure how to use that with a pandas dataframe.

推荐答案

下面的工作代码执行的在区域内选择点的过程始于创建2个地理数据框.第一个包含多边形,第二个包含与第一个做spatial join的所有点.使用空间连接运算符within可以选择位于多边形内的点.运算的结果也是一个地理数据框,它仅包含落在多边形区域内的所需点.

A process to select points within a region as performed by the working code below starts with creating 2 geodataframes. The first one contains a polygon, and the second contains all the points to do spatial join with the first. The spatial join operator within is used to enable the points that fall inside the polygon to be selected. The result of the operation is also a geodataframe, it contains only the required points that fall within the area of the polygon.

locations.csv的内容; 6行,带列标题. 注意:第一行中没有空格.

The content of locations.csv; 6 lines with column headers. Note: no spaces in the first row.

ID,LAT,LON
1, 15.1, 10.0
2, 15.2, 15.1
3, 15.3, 20.2
4, 15.4, 25.3
5, 15.5, 30.4

代码:

import pandas as pd
import geopandas as gpd
from shapely import wkt
from shapely.geometry import Point, Polygon
from shapely.wkt import loads

# Create a geo-dataframe `polygon_df` having 1 row of polygon
# This polygon will be used to select points in a geodataframe
d = {'poly_id':[1], 'wkt':['POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))']}
df = pd.DataFrame( data=d )
geometry = [loads(pgon) for pgon in df.wkt]
polygon_df = gpd.GeoDataFrame(df, \
                   crs={'init': 'epsg:4326'}, \
                   geometry=geometry)

# One can plot this polygon with the command:
# polygon_df.plot()

# Read the file with `pandas`
locs = pd.read_csv('locations.csv', sep=',')

# Making it a geo-dataframe with new name: `geo_locs`
geo_locs = gpd.GeoDataFrame(locs, crs={'init': 'epsg:4326'})
locs_geom = [Point(xy) for xy in zip(geo_locs.LON, geo_locs.LAT)]
geo_locs['wkt'] = geo_locs.apply( lambda x: Point(x.LON, x.LAT), axis=1 )
geo_locs = gpd.GeoDataFrame(geo_locs, crs={'init': 'epsg:4326'}, \
    geometry=geo_locs['wkt'])

# Do a spatial join of `point` within `polygon`, get the result in `pts_in_poly` GeodataFrame.
pts_in_poly = gpd.sjoin(geo_locs, polygon_df, op='within', how='inner')

# Print the ID of the points that fall within the polygon.
print(pts_in_poly.ID)

# The output will be:
#2    3
#3    4
#4    5
#Name: ID, dtype: int64

# Plot the polygon and all the points.
ax1 = polygon_df.plot(color='lightgray', zorder=1)
geo_locs.plot(ax=ax1, zorder=5, color="red")

输出图:

在图中,ID为3、4和5的点落在多边形内.

In the plot, the points with ID's 3, 4, and 5 fall within the polygon.

这篇关于如何在 pandas 数据框中选择地理区域内的对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆