如何根据从一个数据帧到另一个的两个键找到最接近的匹配? [英] How to find the closest match based on 2 keys from one dataframe to another?
问题描述
我仍然尝试通过坐标中最接近的匹配将它们链接起来,并且不知道从哪里开始。
我在想一些使用
np.abs((location ['纬度'] - 天气['纬度'])+(位置['longitude]] - 天气['longitude'])
每个
位置...
位置纬度经度组件\\ \\
A 39.463744 -76.119411活动
B 39.029252 -76.964251活动
C 33.626946 -85.969576活动
D 49.286337 10.567013活动
E 37.071777 -76.360785活动
天气...
站码站名称纬度经度
US1FLSL0019 PORT ST。LUCIE 4.0 NE 27.3237 -80.3111
US1TXTV0133 LAKEWAY 2.8 W 30.3597 -98.0252
USC00178998 WALTHAM 44.6917 -68.3475
USC00178998 WALTHAM 44.6917 - 68.3475
USC00178998 WALTHAM 44.6917 -68.3475
输出将是位置数据框上的一个新列,站名称为最接近的比赛
然而,我不知道如何循环来完成这一切。任何帮助将不胜感激。
谢谢,
Scott
dist
:$($)$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
对于给定位置,您可以找到最近的车站,如下所示:
lat = 39.463744
long = -76.119411
weather.apply(
lambda row:dist(lat,long,row ['Latitude]],row ['Longitude] ]),
axis = 1)
这将计算到所有气象站的距离。使用 idxmin
可以找到最近的车站名称:
distance = weather .apply(
lambda row:dist(lat,long,row ['Latitude]],row ['Longitude]],
axis = 1)
weather.loc [distance.idxmin (),'StationName']
让我们把这一切放在一个函数中:
def find_station(lat,long):
distance = weather.apply(
lambda row:dist(lat,long,row ['Latitude'],row ['Longitude]],
axis = 1)
return weather.loc [distance.idxmin(),'StationName']
现在,您可以将所有最近的站点应用到位置
dataframe: / p>
locations.apply(
lambda row:find_station(row ['Latitude]],row ['Longitude]] ,
axis = 1)
输出:
0 WALTHAM
1 WALTHAM
2 PORTST.LUCIE
3 WALTHAM
4 PORTST.LUCIE
I have 2 dataframes I'm working with. One has a bunch of locations and coordinates (longitude, latitude). The other is a weather data set with data from weather stations all over the world and their respective coordinates. I am trying to link up the nearest weather station to each location in my data set. The weather station names and my location names are not matches.
I am left trying to link them by closest match in coordinates and have no idea where to begin.
I was thinking some use of
np.abs((location['latitude']-weather['latitude'])+(location['longitude']-weather['longitude'])
Examples of each
location...
Location Latitude Longitude Component \
A 39.463744 -76.119411 Active
B 39.029252 -76.964251 Active
C 33.626946 -85.969576 Active
D 49.286337 10.567013 Active
E 37.071777 -76.360785 Active
weather...
Station Code Station Name Latitude Longitude
US1FLSL0019 PORT ST. LUCIE 4.0 NE 27.3237 -80.3111
US1TXTV0133 LAKEWAY 2.8 W 30.3597 -98.0252
USC00178998 WALTHAM 44.6917 -68.3475
USC00178998 WALTHAM 44.6917 -68.3475
USC00178998 WALTHAM 44.6917 -68.3475
Output would be a new column on the location dataframe with the station name that is the closest match
However I am not sure how to loop thru both to accomplish this. Any help would be greatly appreciated..
Thanks, Scott
Let's say you have a distance function dist
that you want to minimize:
def dist(lat1, long1, lat2, long2):
return np.abs((lat1-lat2)+(long1-long2))
For a given location, you can find the nearest station as follows:
lat = 39.463744
long = -76.119411
weather.apply(
lambda row: dist(lat, long, row['Latitude'], row['Longitude']),
axis=1)
This will calculate the distance to all weather stations. Using idxmin
you can find the closest station name:
distances = weather.apply(
lambda row: dist(lat, long, row['Latitude'], row['Longitude']),
axis=1)
weather.loc[distances.idxmin(), 'StationName']
Let's put all this in a function:
def find_station(lat, long):
distances = weather.apply(
lambda row: dist(lat, long, row['Latitude'], row['Longitude']),
axis=1)
return weather.loc[distances.idxmin(), 'StationName']
You can now get all the nearest stations by applying it to the locations
dataframe:
locations.apply(
lambda row: find_station(row['Latitude'], row['Longitude']),
axis=1)
Output:
0 WALTHAM
1 WALTHAM
2 PORTST.LUCIE
3 WALTHAM
4 PORTST.LUCIE
这篇关于如何根据从一个数据帧到另一个的两个键找到最接近的匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!