RTree:计算另一组点中每个点内邻域中的点 [英] RTree: Count points in the neighbourhoods within each point of another set of points

查看:112
本文介绍了RTree:计算另一组点中每个点内邻域中的点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么这不返回每个邻域中的点数计数(边界框)?

Why is this not returning a count of number of points in each neighbourhoods (bounding box)?

import geopandas as gpd

def radius(points_neighbour, points_center, new_field_name, r):
    """
    :param points_neighbour:
    :param points_center:
    :param new_field_name: new field_name attached to points_center
    :param r: radius around points_center
    :return:
    """
    sindex = points_neighbour.sindex
    pts_in_neighbour = []
    for i, pt_center in points_center.iterrows():
        nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r)))
        pts_in_this_neighbour = points_neighbour[nearest_index]
        pts_in_neighbour.append(len(pts_in_this_neighbour))
    points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour)

每个循环都给出相同的结果.

Every loop gives the same result.

第二个问题,我怎么能找到第k个最近的邻居?

Second question, how can I find k-th nearest neighbour?

有关问题本身的更多信息:

More information about the problem itself:

  • 我们正在以非常小的规模进行此操作,例如美国华盛顿州或加拿大不列颠哥伦比亚省

  • We are doing it at a very small scale e.g. Washington State, US or British Columbia, Canada

我们希望尽可能地利用大熊猫,因为它与大熊猫相似并支持空间索引:RTree

We hope to utilize geopandas as much as possible since it's similar to pandas and supports spatial indexing: RTree

例如,这里的sindex具有最近,交集等方法.

For example, sindex here has method nearest, intersection, etc.

如果您需要更多信息,请发表评论.这是GeoPandasBase类中的代码

Please comment if you need more information. This is the code in class GeoPandasBase

@property
def sindex(self):
    if not self._sindex_generated:
        self._generate_sindex()
    return self._sindex

我尝试了Richard的例子,但是没有用

I tried Richard's example but it didn't work

def radius(points_neighbour, points_center, new_field_name, r):
    """
    :param points_neighbour:
    :param points_center:
    :param new_field_name: new field_name attached to points_center
    :param r: radius around points_center
    :return:
    """
    sindex = points_neighbour.sindex
    pts_in_neighbour = []
    for i, pt_center in points_center.iterrows():
        pts_in_this_neighbour = 0
        for n in sindex.intersection(((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r))):
            dist = pt_center.distance(points_neighbour['geometry'][n])
            if dist < radius:
                pts_in_this_neighbour = pts_in_this_neighbour + 1
        pts_in_neighbour.append(pts_in_this_neighbour)
    points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour)

要下载形状文件,请转到 https://catalogue.data.gov.bc.ca/dataset/hellobc-activities-and-attractions-listing 并选择要下载的ArcView

To download the shape file, goto https://catalogue.data.gov.bc.ca/dataset/hellobc-activities-and-attractions-listing and choose ArcView to download

推荐答案

我附加了代码,应该对其进行一些小的修改即可完成您想要的操作.

I've attached code which should, with some minor modifications, do what you want.

我认为您的问题是由以下两个原因之一引起的:

I think your problem arose for one of two reasons:

  1. 您没有正确构建空间索引.您对我的评论的回答表明您并不完全了解如何编制空间索引.

  1. You were not correctly constructing the spatial index. Your responses to my comments suggested that you weren't wholly aware of how the spatial index was getting made.

空间查询的边界框构建不正确.

The bounding box for your spatial query was not built correctly.

我将在下面讨论这两种可能性.

I'll discuss both possibilities below.

事实证明,只需输入以下内容即可构建空间索引:

As it turns out, the spatial index is constructed simply by typing:

sindex = gpd_df.sindex

魔术.

但是gpd_df.sindex从哪里获得数据呢?假定数据以shapely格式存储在名为geometry的列中.如果您尚未向此类列添加数据,则会引发警告.

But from whence does gpd_df.sindex get its data? It assumes that the data is stored in a column called geometry in a shapely format. If you have not added data to such a column, it will raise a warning.

数据帧的正确初始化看起来像这样:

A correct initialization of the data frame would look like so:

#Generate random points throughout Oregon
x = np.random.uniform(low=oregon_xmin, high=oregon_xmax, size=10000)
y = np.random.uniform(low=oregon_ymin, high=oregon_ymax, size=10000)

#Turn the lat-long points into a geodataframe
gpd_df = gpd.GeoDataFrame(data={'x':x, 'y':y})
#Set up point geometries so that we can index the data frame
#Note that I am using x-y points!
gpd_df['geometry'] = gpd_df.apply(lambda row: shapely.geometry.Point((row['x'], row['y'])), axis=1)

#Automagically constructs a spatial index from the `geometry` column
gpd_df.sindex 

在问题中查看上述示例代码将有助于诊断问题并继续解决问题.

Seeing the foregoing sort of example code in your question would have been helpful in diagnosing your problem and getting going on solving it.

由于没有收到非常明显的警告,当缺少几何列时会出现geopandas提示:

Since you did not get the extremely obvious warning geopandas raises when a geometry column is missing:

AttributeError:尚未设置几何数据(在几何"列中预期.

AttributeError: No geometry data set yet (expected in column 'geometry'.

我认为您可能已经正确完成了这一部分.

I think you've probably done this part right.

在您的问题中,您将像这样形成一个边界框:

In your question, you form a bounding box like so:

nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r)))

事实证明,边界框的形式为:

As it turns out, bounding boxes have the form:

(West, South, East, North)

至少,它们适用于X-Y样式点,例如shapely.geometry.Point(Lon,Lat)

At least, they do for X-Y styled-points, e.g. shapely.geometry.Point(Lon,Lat)

在我的代码中,我使用以下代码:

In my code, I use the following:

bbox = (cpt.x-radius, cpt.y-radius, cpt.x+radius, cpt.y+radius)

工作示例

将以上内容放在一起,就可以得出这个工作示例.请注意,我还将演示如何按距离对点进行排序,回答您的第二个问题.

Working example

Putting the above together leads me to this working example. Note that I also demonstrate how to sort points by distance, answering your second question.

#!/usr/bin/env python3

import numpy as np
import numpy.random
import geopandas as gpd
import shapely.geometry
import operator

oregon_xmin = -124.5664
oregon_xmax = -116.4633
oregon_ymin = 41.9920
oregon_ymax = 46.2938

def radius(gpd_df, cpt, radius):
  """
  :param gpd_df: Geopandas dataframe in which to search for points
  :param cpt:    Point about which to search for neighbouring points
  :param radius: Radius about which to search for neighbours
  :return:       List of point indices around the central point, sorted by
                 distance in ascending order
  """
  #Spatial index
  sindex = gpd_df.sindex
  #Bounding box of rtree search (West, South, East, North)
  bbox = (cpt.x-radius, cpt.y-radius, cpt.x+radius, cpt.y+radius)
  #Potential neighbours
  good = []
  for n in sindex.intersection(bbox):
    dist = cpt.distance(gpd_df['geometry'][n])
    if dist<radius:
      good.append((dist,n))
  #Sort list in ascending order by `dist`, then `n`
  good.sort() 
  #Return only the neighbour indices, sorted by distance in ascending order
  return [x[1] for x in good]

#Generate random points throughout Oregon
x = np.random.uniform(low=oregon_xmin, high=oregon_xmax, size=10000)
y = np.random.uniform(low=oregon_ymin, high=oregon_ymax, size=10000)

#Turn the lat-long points into a geodataframe
gpd_df = gpd.GeoDataFrame(data={'x':x, 'y':y})
#Set up point geometries so that we can index the data frame
gpd_df['geometry'] = gpd_df.apply(lambda row: shapely.geometry.Point((row['x'], row['y'])), axis=1)

#The 'x' and 'y' columns are now stored as part of the geometry, so we remove
#their columns in order to save space
del gpd_df['x']
del gpd_df['y']

for i, row in gpd_df.iterrows():
  neighbours = radius(gpd_df,row['geometry'],0.5)
  print(neighbours)
  #Use len(neighbours) here to construct a new row for the data frame

(我一直在评论中要求的代码类似于上面的代码,但是它例证了您的问题.请注意,使用random简洁地生成用于实验的数据集.)

(What I had been requesting in the comments is code that looks like the foregoing, but which exemplifies your problem. Note the use of random to succinctly generate a dataset for experimentation.)

这篇关于RTree:计算另一组点中每个点内邻域中的点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆