Python Pandas“应用"返回系列;无法转换为数据框 [英] Python Pandas 'apply' returns series; can't convert to dataframe
问题描述
好吧,我快要结束了.我正在使用geopy对数据框进行地理编码.我编写了一个简单的函数来输入-国家名称-并返回纬度和经度.我使用apply运行该函数,它返回一个Pandas系列对象.我似乎无法将其转换为数据框.我确定我缺少明显的东西,但是我是python的新手,仍然是RTFMing.顺便说一句,地理编码器功能很好用.
OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great.
# Import libraries
import os
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
def locate(x):
geolocator = Nominatim()
# print(x) # debug
try:
#Get geocode
location = geolocator.geocode(x, timeout=8, exactly_one=True)
lat = location.latitude
lon = location.longitude
except:
#didn't work for some reason that I really don't care about
lat = np.nan
lon = np.nan
# print(lat,lon) #debug
return lat, lon # Note: also tried return { 'LAT': lat, 'LON': lon }
df_geo_in = df_addr.drop_duplicates(['COUNTRY']).reset_index() #works perfectly
df_geo_in['LAT'], df_geo_in['LON'] = df_geo_in.applymap(locate)
# error: returns more than 2 values - default index + column with results
我也尝试过
df_geo_in['LAT','LON'] = df_geo_in.applymap(locate)
我得到一个没有索引的数据框和一个包含序列的列.
I get a single dataframe with no index and a single colume with the series in it.
我尝试了许多其他方法,包括"applymap":
I've tried a number of other methods, including 'applymap' :
source_cols = ['LAT','LON']
new_cols = [str(x) for x in source_cols]
df_geo_in = df_addr.drop_duplicates(['COUNTRY']).set_index(['COUNTRY'])
df_geo_in[new_cols] = df_geo_in.applymap(locate)
很长一段时间后返回错误:
which returned an error after a long time:
ValueError:列的长度必须与键的长度相同
ValueError: Columns must be same length as key
我还尝试使用df.from_dict(df_geo_in)
方法将系列手动转换为数据框,但未成功.
I've also tried manually converting the series to a dataframe using the df.from_dict(df_geo_in)
method without success.
目标是对166个唯一的国家/地区进行地理编码,然后将其重新加入df_addr中的188K地址.我正在尝试在代码中成为熊猫,如果可能的话,不要编写循环.但是我还没有发现将系列转换为数据帧的魔力,这是我第一次尝试使用apply.
The goal is to geocode 166 unique countries, then join it back to the 188K addresses in df_addr. I'm trying to be pandas-y in my code and not write loops if possible. But I haven't found the magic to convert series into dataframes and this is the first time I've tried to use apply.
预先感谢-古代C程序员
Thanks in advance - ancient C programmer
推荐答案
我假设df_geo
是具有单列的df,所以我相信以下应该可行:
I'm assuming that df_geo
is a df with a single column so I believe the following should work:
更改:
return lat, lon
到
return pd.Series([lat, lon])
那么您应该可以像这样分配:
then you should be able to assign like so:
df_geo_in[['LAT', 'LON']] = df_geo_in.apply(locate)
您试图做的是将applymap
的结果分配给2个新列,这在这里是不正确的,因为applymap
设计用于df中的每个元素,因此,除非lhs具有相同的预期形状,否则将不会.不能达到预期的效果.
What you tried to do was assign the result of applymap
to 2 new columns which is incorrect here as applymap
is designed to work on every element in a df so unless the lhs has the same expected shape this won't give the desired result.
您使用的后一种方法也不正确,因为您删除了重复的国家/地区,然后希望将其分配给每个国家/地区,但形状有所不同.
Your latter method is also incorrect because you drop the duplicate countries and then expect this to assign every country geolocation back but the shape is different.
对于大型df,创建地理位置非重复的df,然后将其合并回较大的df,这样可能更快:
It is probably quicker for large df's to create the geolocation non-duplicated df's and then merge this back to your larger df like so:
geo_lookup = df_addr.drop_duplicates(['COUNTRY'])
geo_lookup[['LAT','LNG']] = geo_lookup['COUNTRY'].apply(locate)
df_geo_in.merge(geo_lookup, left_on='COUNTRY', right_on='COUNTRY', how='left')
这将创建一个具有不重复国家/地区且具有地理位置地址的df,然后执行向左合并回主df的操作.
this will create a df with non duplicated countries with geo location addresses and then we perform a left merge back to the master df.
这篇关于Python Pandas“应用"返回系列;无法转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!