pandas 的字典矢量化查询 [英] Pandas Vectorized lookup of Dictionary
问题描述
这似乎应该是一个普通的用例,但是我没有找到任何好的指导.我有一个可行的解决方案,但我宁愿进行向量化查找,也不愿使用Pandas apply()
函数.
This seems like it should be a common use case but I'm not finding any good guidance on this. I have a solution that works but I would rather have a vectorized lookup rather than using the Pandas apply()
function.
这是我正在做的事的一个例子:
Here is an example of what I am doing:
import pandas as pd
example_dict = {
"category1":{
"field1": 0.0,
"filed2": 5.0},
"category2":{
"field1": 5.0,
"field2": 8.0}}
d = {"ids": range(10),
"category": ["category1" if x % 2 == 0 else "category2" for x in range(10)]}
df = pd.DataFrame(d)
# The operation I am trying to vectorize
df['category_data'] = df.apply(lambda row: example_dict[row['category']], axis=1)
在最后一行,您可以看到我正在使用 apply()
函数执行字典查找的位置.我的直觉告诉我应该有一种向量化方法.我可能是错的,但我也想知道这一点.我经常遇到需要在字典中查找信息并将其添加为 DataFrame
的列的情况.
On the last line you can see where I am using the apply()
function to perform the dictionary lookup. My gut tells me there should be a way to vectorize this. I may be wrong, but I would like to know that as well. I often run into scenarios where I need to lookup information in a dictionary and add it as a column the a DataFrame
.
推荐答案
通过使用 map
df['map']=df.category.map(example_dict)
df
Out[839]:
category ids category_data \
0 category1 0 {'field1': 0.0, 'filed2': 5.0}
1 category2 1 {'field1': 5.0, 'field2': 8.0}
2 category1 2 {'field1': 0.0, 'filed2': 5.0}
3 category2 3 {'field1': 5.0, 'field2': 8.0}
4 category1 4 {'field1': 0.0, 'filed2': 5.0}
5 category2 5 {'field1': 5.0, 'field2': 8.0}
6 category1 6 {'field1': 0.0, 'filed2': 5.0}
7 category2 7 {'field1': 5.0, 'field2': 8.0}
8 category1 8 {'field1': 0.0, 'filed2': 5.0}
9 category2 9 {'field1': 5.0, 'field2': 8.0}
map
0 {'field1': 0.0, 'filed2': 5.0}
1 {'field1': 5.0, 'field2': 8.0}
2 {'field1': 0.0, 'filed2': 5.0}
3 {'field1': 5.0, 'field2': 8.0}
4 {'field1': 0.0, 'filed2': 5.0}
5 {'field1': 5.0, 'field2': 8.0}
6 {'field1': 0.0, 'filed2': 5.0}
7 {'field1': 5.0, 'field2': 8.0}
8 {'field1': 0.0, 'filed2': 5.0}
9 {'field1': 5.0, 'field2': 8.0}
如果需要将它们放在不同的列中
If you need them into different columns
pd.DataFrame(df['map'].tolist())
Out[843]:
field1 field2 filed2
0 0.0 NaN 5.0
1 5.0 8.0 NaN
2 0.0 NaN 5.0
3 5.0 8.0 NaN
4 0.0 NaN 5.0
5 5.0 8.0 NaN
6 0.0 NaN 5.0
7 5.0 8.0 NaN
8 0.0 NaN 5.0
9 5.0 8.0 NaN
或
df['map'].apply(pd.Series)
Out[844]:
field1 field2 filed2
0 0.0 NaN 5.0
1 5.0 8.0 NaN
2 0.0 NaN 5.0
3 5.0 8.0 NaN
4 0.0 NaN 5.0
5 5.0 8.0 NaN
6 0.0 NaN 5.0
7 5.0 8.0 NaN
8 0.0 NaN 5.0
9 5.0 8.0 NaN
这篇关于 pandas 的字典矢量化查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!