如何使用 pandas 的多索引数据框使用地图功能? [英] how to use map function for multiindex dataframe using pandas?

查看：53 发布时间：2021/4/30 20:01:34 python pandas dataframe dictionary series

本文介绍了如何使用 pandas 的多索引数据框使用地图功能?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个如下所示的数据框

I have a data frame like as shown below

df = pd.DataFrame({'source_code':['11','11','12','13','14',np.nan],
                   'source_description':['test1', 'test1','test2','test3',np.nan,'test5'],
                   'key_id':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})

我也有如下所示的hash_file数据帧

I also have a hash_file data frame like as shown below

hash_file = pd.DataFrame({'source_id':['11','12','13','14','15'],
                          'source_code':['test1','test2','test3','test4','test5'],
                          'hash_id':[911,512,713,814,616]})
id_file =  hash_file.set_index(['source_id','source_code'])['hash_id']

id_file 中将没有重复项(source_id，source_code)将始终是唯一的

There will be no duplicates in the id_file (source_id, source_code) will always be unique

现在，我想根据 source_code ， source_description 的匹配条目填写 df 中的 key_id 列.code>，其中包含 hash_file 中的 source_id 和 source_code 列.


Now, I would like to fill in the key_id column in df based on matching entries of source_code, source_description with source_id and source_code columns from hash_file.
所以，我尝试了以下
df['key_id'] = df['source_code','source_description'].map(id_file) 

它抛出了错误
 KeyError :(源代码"，源描述")

KeyError: ('source_code', 'source_description')
所以，我在下面尝试了另一种方法
So, I tried another approach below
df['key_id'] = df[['source_code','source_description']].map(id_file)

它又抛出了一个错误
 AttributeError:"DataFrame"对象没有属性"map" 

AttributeError: 'DataFrame' object has no attribute 'map'
因此，我希望我的输出如下所示.请注意，两者之间可能会有 NA ，并且必须区分大小写.这意味着 id_file 中的索引与 df 中的列的比较必须不区分大小写.
So, I expect my output to be like as shown below. Please note that there might be NA in between and it has to be case-insensitive. Meaning the comparison of indices in the id_file with the columns in the df has to be case-insensitive.
我只想使用 map 方法.也欢迎任何其他优雅的方法
I would like to do only with map approach. Any other elegant approach is also welcome
source_code source_description  key_id
11            test1              911
11            test1              911
12            test2              512
13            test3              713
14             NaN               814
NaN           test5              616


推荐答案
这似乎是相当标准的 merge ，但有一些重命名:
This seems to be a fairly standard merge with some renaming:
(df.merge(hash_file, left_on = ['source_code','source_description'], right_on = ['source_id','source_code'])
    .drop(columns = ['key_id','source_id','source_code_y'])
    .rename(columns = {'source_code_x':'source_code','hash_id':'key_id'})
)

输出

    source_code source_description  key_id
0   11          test1               911
1   11          test1               911
2   12          test2               512
3   13          test3               713

使用 map (用于问题中的更新输入值)


Using map (for updated input values in the question)

df['key_id'] = df.set_index(['source_code','source_description']).index.map(id_file)

输出
    source_code source_description  key_id
0   11          test1               911.0
1   11          test1               911.0
2   12          test2               512.0
3   13          test3               713.0
4   14          NaN                 NaN
5   NaN         test5               NaN


                        这篇关于如何使用 pandas 的多索引数据框使用地图功能?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何使用 pandas 的多索引数据框使用地图功能? [英] how to use map function for multiindex dataframe using pandas?

问题描述

推荐答案

使用 `map` (用于问题中的更新输入值)

Using `map` (for updated input values in the question)

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用 pandas 的多索引数据框使用地图功能? [英] how to use map function for multiindex dataframe using pandas?

问题描述

推荐答案

使用 map (用于问题中的更新输入值)

Using map (for updated input values in the question)

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

使用 `map` (用于问题中的更新输入值)

Using `map` (for updated input values in the question)

登录关闭