如何在查询radius-BallTree sklearn, radians, km中引入收音机? [英] How can I introduce the radio in query radius-BallTree sklearn, radians, km?

查看:65
本文介绍了如何在查询radius-BallTree sklearn, radians, km中引入收音机?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理纬度和经度数据.我使用 BallTree 是因为数据集中有很多行(32000 行).如果我用半正弦距离构建树:

I'm working with latitude and longitude data. I've used BallTree because I have many rows (32000 rows) in the dataset. If I built the tree with haversine distance:

'''' model_BTree = BallTree(np.array(points_sec_rad),metric='haversine') ''''

并且我将纬度和经度转换为弧度单位,如何将 query_radius (max_dist_rad) 应用于我想要定位的点?我使用了 0.150 米作为半径,但我不确定是否应该使用弧度的近似值.

and I transform the latitude and longitude to radians units, how can I apply query_radius (max_dist_rad) to the points which I would like to locate? I've used 0.150 meters as radius but I'm not sure if I should use an approximation in radians.

''''ind_BTree,dist_BTree = model_BTree.query_radius(np.array(points_loc_rad), r=max_dist_rad, return_distance = True, sort_results=True) ''''

另外,如何限制收音机内的邻居数量?谢谢

Also, how can I limit the number of neighbors inside the radio? thank you

推荐答案

edit: 带有工作代码和解释的示例

edit: Example with working code and explanation

可视化应用 haversine 距离时发生的情况的最佳方法是将其可视化所有大圆距离都是在一个小乒乓球上测量的.

Best way to visualise what is happening with appying the haversine distance, is by visualise that all great circle distances are measured on a small pingpong sphere.

如果您想将 query_radius() 应用于更大的球体,例如地球,您需要将地球公里/英里转换回单位乒乓球体.假设你想要 100 英里,你需要除以地球半径(以英里为单位).query_radius() 的输出需要通过乘法再次转换回英里/公里.

If you want apply query_radius() on larger spheres, like earth, you need to convert the earthy km/miles back to the unit pingpong sphere. Say you want 100 miles, you need to divide by the earth radius in miles. The output of query_radius() needs to be transformed back to miles/km again by multiplying.

假设我们在 Pandas 中有以下城镇和博物馆数据:

Say we have the following towns and museum data in Pandas:

import pandas as pd
import numpy as np

from sklearn.neighbors import BallTree

towns = pd.DataFrame({
    "name" : ["Merry Hill", "Spring Valley", "Nesconset"],
    "lat" : [36.01, 41.32, 40.84],
    "long" : [-76.7, -89.20, -73.15]
})

museum = pd.DataFrame({
    "name" : ["Motte Historical Car Museum, Menifee", "Crocker Art Museum, Sacramento", "World Chess Hall Of Fame, St.Louis", "National Atomic Testing Museum, Las", "National Air and Space Museum, Washington", "The Metropolitan Museum of Art", "Museum of the American Military Family & Learning Center"],
    "lat" : [33.743511, 38.576942, 38.644302, 36.114269, 38.887806, 40.778965, 35.083359],
    "long" : [-117.165161, -121.504997, -90.261154, -115.148315, -77.019844, -73.962311, -106.381531]
})

我们需要使用

places_gps = towns[["lat", "long"]].values
museum_gps = museum[["lat", "long"]].values

现在我们可以用

places_radians =  np.radians(places_gps)
museum_radians = np.radians(museum_gps)

tree = BallTree(museum_radians, leaf_size=15, metric='haversine')

再一次,想象这个小球只有乒乓球那么大.要将它们用于更大/更小的球体,我们需要乘/除.

Again, imagine this little ball is just the size of a pingpong ball. To use them for larger/smaller spheres we need to multiply/divide.

说我想要 100 英里内的所有博物馆;

Say I want all museum within 100 miles;

distance_in_miles = 100
earth_radius_in_miles = 3958.8
    
radius = distance_in_miles / earth_radius_in_miles

现在我可以应用 query_radius(),并且记住返回的距离需要转换回英里.这里的距离是单位球体上的大圆距离,我们的乒乓球.

Now I can apply query_radius(), and remember the returned distances need to be converted back to miles. The distances here are the great circle distance on the unit sphere, our pingpong ball.

is_within, distances = tree.query_radius(places_radians, r=radius, count_only=False, return_distance=True) 

所以我们

distances_in_miles = distances * earth_radius_in_miles

让我们检查输出,我们看到 distances_in_miles

Lets check the output and we see that distances_in_miles

array([array([], dtype=float64), array([], dtype=float64),
       array([42.68960475])], dtype=object)

翻译成Nesconset"应该是<距大都会艺术博物馆"100 英里,并且-该距离约为 42.689 英里.请注意,最后一个数组(Nesconset)确实只返回了一个距离,并且在 is_within 的帮助下,我们找到了 5 中博物馆的索引,即 museum.name[5],大都会艺术博物馆".

Which translate to that 'Nesconset' should be < 100 Miles from 'The Metropolitan Museum of Art', and- that this distance is around 42.689 Miles. Notice indeed only a distance is returned for the last array (Nesconset), and with help of is_within we find the index of the museum within in 5, which is museum.name[5], 'The Metropolitan Museum of Art'.

根据检查方法的不同,它不会精确到 42.689 英里,但使用 Google 地图快速检查确认它在该范围内.地球根本就不是一个完美的球体,所以会有误差.

Depending on the method of checking, it won't be exact 42.689 miles, but a quick check with Google maps confirms it is around that range. The earth is simply not a perfect sphere so there will be errors.

就像我原来的帖子一样,很容易出错,因为忘记应用校正因子、交换纬度/经度值或公里/米.

Like my original post, errors are easily made, in forgetting to apply the correction factor, swap lat/long values, or km/meters.

这篇关于如何在查询radius-BallTree sklearn, radians, km中引入收音机?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆