如何测量两组点之间的成对距离? [英] How to measure pairwise distances between two sets of points?
问题描述
我有两个数据集(csv文件).它们都包含两组点(220和4400)的纬度-经度.现在,我要测量两组点之间的成对距离(英里)(220 x 4400).我该如何在python中做到这一点?与此问题类似:
最好使用 sklearn
,它完全符合您的要求.
说我们有一些样本数据
towns = pd.DataFrame({名称":["Merry Hill","Spring Valley","Nesconset"],"lat":[36.01、41.32、40.84],长":[-76.7,-89.20,-73.15]})博物馆= pd.DataFrame({名称":: ["Motte历史汽车博物馆,Menifee","Crocker艺术博物馆,萨克拉曼多",世界象棋名人堂,圣路易斯",国家原子测试博物馆,拉斯",国家航空航天局"华盛顿博物馆",大都会艺术博物馆",美国军人家庭博物馆和博物馆".学习中心"],"lat":[33.743511、38.576942、38.644302、36.114269、38.887806、40.778965、35.083359],长":[-117.165161,-121.504997,-90.261154,-115.148315,-77.019844,-73.962311,-106.381531]})
您可以使用 sklearn
距离度量标准,该度量标准已实现了hasrsine
导入DistanceMetricdist = DistanceMetric.get_metric('haversine')
使用
提取 numpy
数组值后
places_gps = towns [["lat","long"]].valuesmuseum_gps = museum [["lat","long"]].values
您只是
EARTH_RADIUS = 6371.009haversine_distances = dist.pairwise(np.radians(places_gps),np.radians(museum_gps))haversine_distances * = EARTH_RADIUS
以获取 KM
的距离.如果需要英里,请乘以常数.
如果您仅对半径内最接近的几个或全部感兴趣,请查看 sklearn
https://gist.github.com/rochacbruno/2883505
Best is to use sklearn
which has exactly what you ask for.
Say we have some sample data
towns = pd.DataFrame({
"name" : ["Merry Hill", "Spring Valley", "Nesconset"],
"lat" : [36.01, 41.32, 40.84],
"long" : [-76.7, -89.20, -73.15]
})
museum = pd.DataFrame({
"name" : ["Motte Historical Car Museum, Menifee", "Crocker Art Museum, Sacramento", "World Chess Hall Of Fame, St.Louis", "National Atomic Testing Museum, Las", "National Air and Space Museum, Washington", "The Metropolitan Museum of Art", "Museum of the American Military Family & Learning Center"],
"lat" : [33.743511, 38.576942, 38.644302, 36.114269, 38.887806, 40.778965, 35.083359],
"long" : [-117.165161, -121.504997, -90.261154, -115.148315, -77.019844, -73.962311, -106.381531]
})
You can use sklearn
distance metrics, which has the haversine implemented
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')
After you extract the numpy
array values with
places_gps = towns[["lat", "long"]].values
museum_gps = museum[["lat", "long"]].values
you simply
EARTH_RADIUS = 6371.009
haversine_distances = dist.pairwise(np.radians(places_gps), np.radians(museum_gps) )
haversine_distances *= EARTH_RADIUS
to get the distances in KM
. If you need miles, multiply with constant.
If you are only interested in the closest few, or all within radius, check out sklearn
BallTree algorithm which also has the haversine implemented. It is much faster.
Edit: To convert the output to a dataframe use for instance
pd_distances = pd.DataFrame(haversine_distances, columns=museum.name, index=towns.name, )
pd_distances
这篇关于如何测量两组点之间的成对距离?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!