pandas 的字典矢量化查询 [英] Pandas Vectorized lookup of Dictionary

查看:38
本文介绍了 pandas 的字典矢量化查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这似乎应该是一个普通的用例,但是我没有找到任何好的指导.我有一个可行的解决方案,但我宁愿进行向量化查找,也不愿使用Pandas apply()函数.

This seems like it should be a common use case but I'm not finding any good guidance on this. I have a solution that works but I would rather have a vectorized lookup rather than using the Pandas apply() function.

这是我正在做的事的一个例子:

Here is an example of what I am doing:

import pandas as pd


example_dict = {
        "category1":{
                "field1": 0.0,
                "filed2": 5.0},
        "category2":{
                "field1": 5.0,
                "field2": 8.0}}

d = {"ids": range(10),
     "category": ["category1" if x % 2 == 0 else "category2" for x in range(10)]}

df = pd.DataFrame(d)
# The operation I am trying to vectorize
df['category_data'] = df.apply(lambda row: example_dict[row['category']], axis=1)

在最后一行,您可以看到我正在使用 apply()函数执行字典查找的位置.我的直觉告诉我应该有一种向量化方法.我可能是错的,但我也想知道这一点.我经常遇到需要在字典中查找信息并将其添加为 DataFrame 的列的情况.

On the last line you can see where I am using the apply() function to perform the dictionary lookup. My gut tells me there should be a way to vectorize this. I may be wrong, but I would like to know that as well. I often run into scenarios where I need to lookup information in a dictionary and add it as a column the a DataFrame.

推荐答案

通过使用 map

df['map']=df.category.map(example_dict)
df
Out[839]: 
    category  ids                   category_data  \
0  category1    0  {'field1': 0.0, 'filed2': 5.0}   
1  category2    1  {'field1': 5.0, 'field2': 8.0}   
2  category1    2  {'field1': 0.0, 'filed2': 5.0}   
3  category2    3  {'field1': 5.0, 'field2': 8.0}   
4  category1    4  {'field1': 0.0, 'filed2': 5.0}   
5  category2    5  {'field1': 5.0, 'field2': 8.0}   
6  category1    6  {'field1': 0.0, 'filed2': 5.0}   
7  category2    7  {'field1': 5.0, 'field2': 8.0}   
8  category1    8  {'field1': 0.0, 'filed2': 5.0}   
9  category2    9  {'field1': 5.0, 'field2': 8.0}   
                              map  
0  {'field1': 0.0, 'filed2': 5.0}  
1  {'field1': 5.0, 'field2': 8.0}  
2  {'field1': 0.0, 'filed2': 5.0}  
3  {'field1': 5.0, 'field2': 8.0}  
4  {'field1': 0.0, 'filed2': 5.0}  
5  {'field1': 5.0, 'field2': 8.0}  
6  {'field1': 0.0, 'filed2': 5.0}  
7  {'field1': 5.0, 'field2': 8.0}  
8  {'field1': 0.0, 'filed2': 5.0}  
9  {'field1': 5.0, 'field2': 8.0}  

如果需要将它们放在不同的列中

If you need them into different columns

pd.DataFrame(df['map'].tolist())
Out[843]: 
   field1  field2  filed2
0     0.0     NaN     5.0
1     5.0     8.0     NaN
2     0.0     NaN     5.0
3     5.0     8.0     NaN
4     0.0     NaN     5.0
5     5.0     8.0     NaN
6     0.0     NaN     5.0
7     5.0     8.0     NaN
8     0.0     NaN     5.0
9     5.0     8.0     NaN

df['map'].apply(pd.Series)
Out[844]: 
   field1  field2  filed2
0     0.0     NaN     5.0
1     5.0     8.0     NaN
2     0.0     NaN     5.0
3     5.0     8.0     NaN
4     0.0     NaN     5.0
5     5.0     8.0     NaN
6     0.0     NaN     5.0
7     5.0     8.0     NaN
8     0.0     NaN     5.0
9     5.0     8.0     NaN

这篇关于 pandas 的字典矢量化查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆