Python Pandas“应用"返回系列;无法转换为数据框 [英] Python Pandas 'apply' returns series; can't convert to dataframe

查看:203
本文介绍了Python Pandas“应用"返回系列;无法转换为数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我快要结束了.我正在使用geopy对数据框进行地理编码.我编写了一个简单的函数来输入-国家名称-并返回纬度和经度.我使用apply运行该函数,它返回一个Pandas系列对象.我似乎无法将其转换为数据框.我确定我缺少明显的东西,但是我是python的新手,仍然是RTFMing.顺便说一句,地理编码器功能很好用.

OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great.

# Import libraries 
import os 
import pandas as pd 
import numpy as np
from geopy.geocoders import Nominatim

def locate(x):
    geolocator = Nominatim()
    # print(x) # debug
    try:
        #Get geocode
        location = geolocator.geocode(x, timeout=8, exactly_one=True)
        lat = location.latitude
        lon = location.longitude
    except:
        #didn't work for some reason that I really don't care about
        lat = np.nan
        lon = np.nan
   #  print(lat,lon) #debug
    return lat,  lon # Note: also tried return { 'LAT': lat, 'LON': lon }

df_geo_in = df_addr.drop_duplicates(['COUNTRY']).reset_index()    #works perfectly
df_geo_in['LAT'], df_geo_in['LON']  = df_geo_in.applymap(locate) 
# error: returns more than 2 values - default index + column with results

我也尝试过

df_geo_in['LAT','LON'] = df_geo_in.applymap(locate)

我得到一个没有索引的数据框和一个包含序列的列.

I get a single dataframe with no index and a single colume with the series in it.

我尝试了许多其他方法,包括"applymap":

I've tried a number of other methods, including 'applymap' :

source_cols = ['LAT','LON'] 
new_cols = [str(x) for x in source_cols]

df_geo_in = df_addr.drop_duplicates(['COUNTRY']).set_index(['COUNTRY']) 
df_geo_in[new_cols] = df_geo_in.applymap(locate)

很长一段时间后返回错误:

which returned an error after a long time:

ValueError:列的长度必须与键的长度相同

ValueError: Columns must be same length as key

我还尝试使用df.from_dict(df_geo_in)方法将系列手动转换为数据框,但未成功.

I've also tried manually converting the series to a dataframe using the df.from_dict(df_geo_in) method without success.

目标是对166个唯一的国家/地区进行地理编码,然后将其重新加入df_addr中的188K地址.我正在尝试在代码中成为熊猫,如果可能的话,不要编写循环.但是我还没有发现将系列转换为数据帧的魔力,这是我第一次尝试使用apply.

The goal is to geocode 166 unique countries, then join it back to the 188K addresses in df_addr. I'm trying to be pandas-y in my code and not write loops if possible. But I haven't found the magic to convert series into dataframes and this is the first time I've tried to use apply.

预先感谢-古代C程序员

Thanks in advance - ancient C programmer

推荐答案

我假设df_geo是具有单列的df,所以我相信以下应该可行:

I'm assuming that df_geo is a df with a single column so I believe the following should work:

更改:

return lat,  lon

return pd.Series([lat,  lon])

那么您应该可以像这样分配:

then you should be able to assign like so:

df_geo_in[['LAT', 'LON']] = df_geo_in.apply(locate)

您试图做的是将applymap的结果分配给2个新列,这在这里是不正确的,因为applymap设计用于df中的每个元素,因此,除非lhs具有相同的预期形状,否则将不会.不能达到预期的效果.

What you tried to do was assign the result of applymap to 2 new columns which is incorrect here as applymap is designed to work on every element in a df so unless the lhs has the same expected shape this won't give the desired result.

您使用的后一种方法也不正确,因为您删除了重复的国家/地区,然后希望将其分配给每个国家/地区,但形状有所不同.

Your latter method is also incorrect because you drop the duplicate countries and then expect this to assign every country geolocation back but the shape is different.

对于大型df,创建地理位置非重复的df,然后将其合并回较大的df,这样可能更快:

It is probably quicker for large df's to create the geolocation non-duplicated df's and then merge this back to your larger df like so:

geo_lookup = df_addr.drop_duplicates(['COUNTRY'])
geo_lookup[['LAT','LNG']] = geo_lookup['COUNTRY'].apply(locate)
df_geo_in.merge(geo_lookup, left_on='COUNTRY', right_on='COUNTRY', how='left')

这将创建一个具有不重复国家/地区且具有地理位置地址的df,然后执行向左合并回主df的操作.

this will create a df with non duplicated countries with geo location addresses and then we perform a left merge back to the master df.

这篇关于Python Pandas“应用"返回系列;无法转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆