根据存储在dict中的条件从Pandas数据帧中选择数据 [英] Selecting data from Pandas dataframe based on criteria stored in a dict
问题描述
tempDF = pd.DataFrame({'var1':[12,12,12,12, 45,45,45,51,51,51],
'var2':['a','a','b','b','b','b' 'c','c','d'],
'var3':['e','f','f','f','f','g' 'g','g','g'],
'var4':[1,2,3,3,4,5,6,6,6,7]})
如果我想选择数据帧的一个子集(例如var2 ='b'和var4 = 3),我将使用:
tempDF.loc [(tempDF ['var2'] =='b')& (tempDF ['var4'] == 3),...]
但是,是否可以选择如果匹配的条件存储在dict中,则数据帧的子集,例如:
tempDict = {'var2':' b','var4':3}
重要的是变量名不是预定义的,包含在dict中的变量是可变的。
我一直在困惑一段时间,所以任何建议将不胜感激。
您可以使用 list comprehension 为每个条件创建掩码,然后通过转换为数据框并使用全部
:
在[23]中:pd.DataFrame([tempDF [key] == val为key,val在tempDict.items()])。T.all(axis = 1)
Out [23]:
0 False
1 False
2 True
3 True
4 False
5 False
6 False
7 False
8 Fa lse
9 False
dtype:bool
然后你可以用数据框那个掩码:
mask = pd.DataFrame([tempDF [key] == val for key,val in tempDict.items )])。T.all(axis = 1)
在[25]中:tempDF [mask]
输出[25]:
var1 var2 var3 var4
2 12 bf 3
3 12 bf 3
I have a Pandas dataframe that contains a large number of variables. This can be simplified as:
tempDF = pd.DataFrame({ 'var1': [12,12,12,12,45,45,45,51,51,51],
'var2': ['a','a','b','b','b','b','b','c','c','d'],
'var3': ['e','f','f','f','f','g','g','g','g','g'],
'var4': [1,2,3,3,4,5,6,6,6,7]})
If I wanted to select a subset of the dataframe (e.g. var2='b' and var4=3), I would use:
tempDF.loc[(tempDF['var2']=='b') & (tempDF['var4']==3),:]
However, is it possible to select a subset of the dataframe if the matching criteria are stored within a dict, such as:
tempDict = {'var2': 'b','var4': 3}
It's important that the variable names are not predefined and the number of variables included in the dict is changeable.
I've been puzzling over this for a while and so any suggestions would be greatly appreciated.
You could create mask for each condition using list comprehension and then join them by converting to dataframe and using all
:
In [23]: pd.DataFrame([tempDF[key] == val for key, val in tempDict.items()]).T.all(axis=1)
Out[23]:
0 False
1 False
2 True
3 True
4 False
5 False
6 False
7 False
8 False
9 False
dtype: bool
Then you could slice your dataframe with that mask:
mask = pd.DataFrame([tempDF[key] == val for key, val in tempDict.items()]).T.all(axis=1)
In [25]: tempDF[mask]
Out[25]:
var1 var2 var3 var4
2 12 b f 3
3 12 b f 3
这篇关于根据存储在dict中的条件从Pandas数据帧中选择数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!