使用geopy pandas具有坐标的新列 [英] new column with coordinates using geopy pandas
问题描述
我有一个df:
import pandas as pd
import numpy as np
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import vincenty
df
city_name state_name county_name
0 WASHINGTON DC DIST OF COLUMBIA
1 WASHINGTON DC DIST OF COLUMBIA
2 WASHINGTON DC DIST OF COLUMBIA
3 WASHINGTON DC DIST OF COLUMBIA
4 WASHINGTON DC DIST OF COLUMBIA
5 WASHINGTON DC DIST OF COLUMBIA
6 WASHINGTON DC DIST OF COLUMBIA
7 WASHINGTON DC DIST OF COLUMBIA
8 WASHINGTON DC DIST OF COLUMBIA
9 WASHINGTON DC DIST OF COLUMBIA
我想获取下面数据框中任何一列的纬度和经度坐标.该文档( http://geopy.readthedocs.org/en/latest/#data )非常简单明了处理各个位置的文档时.
>>> from geopy.geocoders import Nominatim
>>> geolocator = Nominatim()
>>> location = geolocator.geocode("175 5th Avenue NYC")
>>> print(location.address)
Flatiron Building, 175, 5th Avenue, Flatiron, New York, NYC, New York, ...
>>> print((location.latitude, location.longitude))
(40.7410861, -73.9896297241625)
>>> print(location.raw)
{'place_id': '9167009604', 'type': 'attraction', ...}
但是我想将该函数应用于df中的每一行并创建一个新列.我尝试了以下
df['city_coord'] = geolocator.geocode(lambda row: 'state_name' (row))
但是我认为我的代码中丢失了一些东西,因为我得到了以下内容:
city_name state_name county_name coordinates
0 WASHINGTON DC DIST OF COLUMBIA None
1 WASHINGTON DC DIST OF COLUMBIA None
2 WASHINGTON DC DIST OF COLUMBIA None
3 WASHINGTON DC DIST OF COLUMBIA None
4 WASHINGTON DC DIST OF COLUMBIA None
5 WASHINGTON DC DIST OF COLUMBIA None
6 WASHINGTON DC DIST OF COLUMBIA None
7 WASHINGTON DC DIST OF COLUMBIA None
8 WASHINGTON DC DIST OF COLUMBIA None
9 WASHINGTON DC DIST OF COLUMBIA None
我希望可以使用Lambda函数来实现以下目的:
city_name state_name county_name city_coord
0 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
1 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
2 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
3 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
4 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
5 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
6 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
7 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
8 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
9 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
10 GLYNCO GA GLYNN 31.2224512, -81.5101023
感谢您的帮助.得到坐标后,我想将它们映射.任何建议的用于映射坐标的资源也将不胜感激.谢谢
您可以调用apply
并在每一行上传递要执行的函数,如下所示:
In [9]:
geolocator = Nominatim()
df['city_coord'] = df['state_name'].apply(geolocator.geocode)
df
Out[9]:
city_name state_name county_name \
0 WASHINGTON DC DIST OF COLUMBIA
1 WASHINGTON DC DIST OF COLUMBIA
city_coord
0 (District of Columbia, United States of Americ...
1 (District of Columbia, United States of Americ...
然后您可以访问纬度和经度属性:
In [16]:
df['city_coord'] = df['city_coord'].apply(lambda x: (x.latitude, x.longitude))
df
Out[16]:
city_name state_name county_name city_coord
0 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
1 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
或通过两次调用apply
在一个衬套中完成此操作:
In [17]:
df['city_coord'] = df['state_name'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df
Out[17]:
city_name state_name county_name city_coord
0 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
1 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
您的尝试geolocator.geocode(lambda row: 'state_name' (row))
也没有执行任何操作,因此为什么您的列中充满了None
值
编辑
@leb在这里提出了一个有趣的观点,如果您有很多重复的值,则对每个唯一值进行地理编码然后再添加以下内容会更有成效:
In [38]:
states = df['state_name'].unique()
d = dict(zip(states, pd.Series(states).apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))))
d
Out[38]:
{'DC': (38.8937154, -76.9877934586326)}
In [40]:
df['city_coord'] = df['state_name'].map(d)
df
Out[40]:
city_name state_name county_name city_coord
0 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
1 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
因此,以上代码使用unique
获取了所有唯一值,并从中构造了一个字典,然后调用map
进行查找并添加了坐标,这比尝试按行对地址进行地理编码更为有效>
I have a df:
import pandas as pd
import numpy as np
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import vincenty
df
city_name state_name county_name
0 WASHINGTON DC DIST OF COLUMBIA
1 WASHINGTON DC DIST OF COLUMBIA
2 WASHINGTON DC DIST OF COLUMBIA
3 WASHINGTON DC DIST OF COLUMBIA
4 WASHINGTON DC DIST OF COLUMBIA
5 WASHINGTON DC DIST OF COLUMBIA
6 WASHINGTON DC DIST OF COLUMBIA
7 WASHINGTON DC DIST OF COLUMBIA
8 WASHINGTON DC DIST OF COLUMBIA
9 WASHINGTON DC DIST OF COLUMBIA
I want to get the latitude and longitude coordinates for any one of the columns in the data frame below. The documentation (http://geopy.readthedocs.org/en/latest/#data) is pretty straightforward when working with the documentation for individual locations.
>>> from geopy.geocoders import Nominatim
>>> geolocator = Nominatim()
>>> location = geolocator.geocode("175 5th Avenue NYC")
>>> print(location.address)
Flatiron Building, 175, 5th Avenue, Flatiron, New York, NYC, New York, ...
>>> print((location.latitude, location.longitude))
(40.7410861, -73.9896297241625)
>>> print(location.raw)
{'place_id': '9167009604', 'type': 'attraction', ...}
However I want to apply the function to each row in the df and make a new column. I've tried the following
df['city_coord'] = geolocator.geocode(lambda row: 'state_name' (row))
but I think I'm missing something in my code because I get the following:
city_name state_name county_name coordinates
0 WASHINGTON DC DIST OF COLUMBIA None
1 WASHINGTON DC DIST OF COLUMBIA None
2 WASHINGTON DC DIST OF COLUMBIA None
3 WASHINGTON DC DIST OF COLUMBIA None
4 WASHINGTON DC DIST OF COLUMBIA None
5 WASHINGTON DC DIST OF COLUMBIA None
6 WASHINGTON DC DIST OF COLUMBIA None
7 WASHINGTON DC DIST OF COLUMBIA None
8 WASHINGTON DC DIST OF COLUMBIA None
9 WASHINGTON DC DIST OF COLUMBIA None
I would like something like this hopefully using the Lambda function:
city_name state_name county_name city_coord
0 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
1 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
2 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
3 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
4 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
5 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
6 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
7 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
8 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
9 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
10 GLYNCO GA GLYNN 31.2224512, -81.5101023
I appreciate any help. After I get the coordinates I'd like to map them. Any recommended resources for mapping coordinates is greatly appreciated too. thanks
You can call apply
and pass the function you want to execute on every row like the following:
In [9]:
geolocator = Nominatim()
df['city_coord'] = df['state_name'].apply(geolocator.geocode)
df
Out[9]:
city_name state_name county_name \
0 WASHINGTON DC DIST OF COLUMBIA
1 WASHINGTON DC DIST OF COLUMBIA
city_coord
0 (District of Columbia, United States of Americ...
1 (District of Columbia, United States of Americ...
You can then access the latitude and longitude attributes:
In [16]:
df['city_coord'] = df['city_coord'].apply(lambda x: (x.latitude, x.longitude))
df
Out[16]:
city_name state_name county_name city_coord
0 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
1 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
Or do it in a one liner by calling apply
twice:
In [17]:
df['city_coord'] = df['state_name'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df
Out[17]:
city_name state_name county_name city_coord
0 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
1 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
Also your attempt geolocator.geocode(lambda row: 'state_name' (row))
did nothing hence why you have a column full of None
values
EDIT
@leb makes an interesting point here, if you have many duplicate values then it'll be more performant to geocode for each unique value and then add this:
In [38]:
states = df['state_name'].unique()
d = dict(zip(states, pd.Series(states).apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))))
d
Out[38]:
{'DC': (38.8937154, -76.9877934586326)}
In [40]:
df['city_coord'] = df['state_name'].map(d)
df
Out[40]:
city_name state_name county_name city_coord
0 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
1 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
So the above gets all the unique values using unique
, constructs a dict from them and then calls map
to perform the lookup and add the coords, this will be more efficient than trying to geocode row-wise
这篇关于使用geopy pandas具有坐标的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!