Python Pandas 'apply' 返回系列;无法转换为数据框 [英] Python Pandas 'apply' returns series; can't convert to dataframe

查看:30
本文介绍了Python Pandas 'apply' 返回系列;无法转换为数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我已经半途而废了.我正在使用 geopy 对数据框进行地理编码.我编写了一个简单的函数来接受输入 - 国家名称 - 并返回纬度和经度.我使用 apply 来运行该函数,它返回一个 Pandas 系列对象.我似乎无法将其转换为数据帧.我确定我遗漏了一些明显的东西,但我是 python 的新手并且仍然是 RTFMing.顺便说一句,地理编码器功能很好用.

OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great.

# Import libraries 
import os 
import pandas as pd 
import numpy as np
from geopy.geocoders import Nominatim

def locate(x):
    geolocator = Nominatim()
    # print(x) # debug
    try:
        #Get geocode
        location = geolocator.geocode(x, timeout=8, exactly_one=True)
        lat = location.latitude
        lon = location.longitude
    except:
        #didn't work for some reason that I really don't care about
        lat = np.nan
        lon = np.nan
   #  print(lat,lon) #debug
    return lat,  lon # Note: also tried return { 'LAT': lat, 'LON': lon }

df_geo_in = df_addr.drop_duplicates(['COUNTRY']).reset_index()    #works perfectly
df_geo_in['LAT'], df_geo_in['LON']  = df_geo_in.applymap(locate) 
# error: returns more than 2 values - default index + column with results

我也试过

df_geo_in['LAT','LON'] = df_geo_in.applymap(locate)

我得到一个没有索引的单个数据框和一个包含系列的单列.

I get a single dataframe with no index and a single colume with the series in it.

我尝试了许多其他方法,包括applymap":

I've tried a number of other methods, including 'applymap' :

source_cols = ['LAT','LON'] 
new_cols = [str(x) for x in source_cols]

df_geo_in = df_addr.drop_duplicates(['COUNTRY']).set_index(['COUNTRY']) 
df_geo_in[new_cols] = df_geo_in.applymap(locate)

很长时间后返回错误:

ValueError: 列的长度必须与键的长度相同

ValueError: Columns must be same length as key

我还尝试使用 df.from_dict(df_geo_in) 方法手动将系列转换为数据帧,但没有成功.

I've also tried manually converting the series to a dataframe using the df.from_dict(df_geo_in) method without success.

目标是对 166 个独特的国家/地区进行地理编码,然后将其连接回 df_addr 中的 188K 地址.我试图在我的代码中成为 pandas-y 并且如果可能的话不写循环.但我还没有发现将系列转换为数据帧的魔力,这是我第一次尝试使用 apply.

The goal is to geocode 166 unique countries, then join it back to the 188K addresses in df_addr. I'm trying to be pandas-y in my code and not write loops if possible. But I haven't found the magic to convert series into dataframes and this is the first time I've tried to use apply.

提前致谢 - 古老的 C 程序员

Thanks in advance - ancient C programmer

推荐答案

我假设 df_geo 是一个只有一列的 df,所以我相信以下应该有效:

I'm assuming that df_geo is a df with a single column so I believe the following should work:

改变:

return lat,  lon

return pd.Series([lat,  lon])

那么你应该能够像这样分配:

then you should be able to assign like so:

df_geo_in[['LAT', 'LON']] = df_geo_in.apply(locate)

您尝试做的是将 applymap 的结果分配给 2 个新列,这在这里不正确,因为 applymap 旨在处理 df 中的每个元素,因此除非lhs 具有相同的预期形状,这不会给出预期的结果.

What you tried to do was assign the result of applymap to 2 new columns which is incorrect here as applymap is designed to work on every element in a df so unless the lhs has the same expected shape this won't give the desired result.

您的后一种方法也不正确,因为您删除了重复的国家/地区,然后期望这会重新分配每个国家/地区的地理位置,但形状不同.

Your latter method is also incorrect because you drop the duplicate countries and then expect this to assign every country geolocation back but the shape is different.

大型 df 创建地理定位非重复 df 然后将其合并回较大的 df 可能会更快,如下所示:

It is probably quicker for large df's to create the geolocation non-duplicated df's and then merge this back to your larger df like so:

geo_lookup = df_addr.drop_duplicates(['COUNTRY'])
geo_lookup[['LAT','LNG']] = geo_lookup['COUNTRY'].apply(locate)
df_geo_in.merge(geo_lookup, left_on='COUNTRY', right_on='COUNTRY', how='left')

这将创建一个具有地理位置地址的非重复国家/地区的 df,然后我们执行左合并返回主 df.

this will create a df with non duplicated countries with geo location addresses and then we perform a left merge back to the master df.

这篇关于Python Pandas 'apply' 返回系列;无法转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆