python中的加速地理定位算法 [英] speedup geolocation algorithm in python
问题描述
我有一组100k的地理位置(纬度/经度)和一个六边形网格(4k多边形).我的目标是计算每个多边形内的点总数.
I have a set 100k of of geo locations (lat/lon) and a hexogonal grid (4k polygons). My goal is to calculate the total number of points which are located within each polygon.
我当前的算法使用2个for循环遍历所有地理点和所有多边形,如果我增加多边形的数量,这的确很慢...您将如何加快算法的速度?我上传了一个最小的示例,该示例创建了10万个随机地理位置,并在网格中使用了561个单元格...
My current algorithm uses 2 for loops to loop over all geo points and all polygons, which is really slow if I increase the number of polygons... How would you speedup the algorithm? I have uploaded a minimal example which creates 100k random geo points and uses 561 cells in the grid...
我还看到读取geo json文件(带有4k多边形)需要一些时间,也许我应该将多边形导出到csv中?
I also saw that reading the geo json file (with 4k polygons) takes some time, maybe i should export the polygons into a csv?
hexagon_grid.geojson文件: https://gist.github.com/Arnold1/9e41454e6eea910a4f6cd68ff1901db1
hexagon_grid.geojson file: https://gist.github.com/Arnold1/9e41454e6eea910a4f6cd68ff1901db1
最小的python示例: https://gist.github.com/Arnold1/ee37a2e4b2dfbfdca9bfae7c7c3a3755
minimal python example: https://gist.github.com/Arnold1/ee37a2e4b2dfbfdca9bfae7c7c3a3755
推荐答案
您无需显式测试每个六边形即可查看给定点是否位于其中.
You don't need to explicitly test each hexagon to see whether a given point is located inside it.
目前,让我们假设您所有的点都位于六角形网格的范围内.由于您的六边形形成规则的晶格,因此您只需要真正知道哪个六边形中心最靠近每个点.
Let's assume, for the moment, that all of your points fall somewhere within the bounds of your hexagonal grid. Because your hexagons form a regular lattice, you only really need to know which of the hexagon centers is closest to each point.
使用绘制输出:
from matplotlib import pyplot as plt
fig, ax = plt.subplots(1, 1, subplot_kw={'aspect': 'equal'})
ax.hold(True)
ax.scatter(xy[:, 0], xy[:, 1], 10, c='b', alpha=0.25, edgecolors='none')
ax.scatter(centroids[:, 0], centroids[:, 1], marker='h', s=(counts + 5),
c=counts, cmap='Reds')
ax.margins(0.01)
我可以考虑几种不同的方式来处理点,这些点取决于所需的精度:
I can think of several different ways you could handle points that fall outside your grid depending on how much accuracy you need:
-
您可以排除位于六边形顶点的外边界矩形之外的点(即
x < xmin
,x > xmax
等).但是,这将无法排除沿着网格边缘位于间隙"内的点.
You could exclude points that fall outside the outer bounding rectangle of your hexagon vertices (i.e.
x < xmin
,x > xmax
etc.). However, this will fail to exclude points that fall within the 'gaps' along the edges of your grid.
另一个简单的选择是根据您的六边形中心的间距在distance
上设置一个截止点,这相当于对您的外部六边形使用圆形近似.
Another straightforward option would be to set a cut-off on distance
according to the spacing of your hexagon centers, which is equivalent to using a circular approximation for your outer hexagons.
如果准确性至关重要,那么您可以定义一个与六边形网格的外部顶点对应的matplotlib.path.Path
,然后使用其
If accuracy is crucial then you could define a matplotlib.path.Path
corresponding to the outer vertices of your hexagonal grid, then use its .contains_points()
method to test whether your points are contained within it. Compared to the other two methods, this would probably be slower and more fiddly to code.
这篇关于python中的加速地理定位算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!