根据存储在dict中的条件从Pandas数据帧中选择数据 [英] Selecting data from Pandas dataframe based on criteria stored in a dict

查看:137
本文介绍了根据存储在dict中的条件从Pandas数据帧中选择数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含大量变量的熊猫数据框。这可以简化为:

  tempDF = pd.DataFrame({'var1':[12,12,12,12, 45,45,45,51,51,51],
'var2':['a','a','b','b','b','b' 'c','c','d'],
'var3':['e','f','f','f','f','g' 'g','g','g'],
'var4':[1,2,3,3,4,5,6,6,6,7]})

如果我想选择数据帧的一个子集(例如var2 ='b'和var4 = 3),我将使用:

  tempDF.loc [(tempDF ['var2'] =='b')& (tempDF ['var4'] == 3),...] 

但是,是否可以选择如果匹配的条件存储在dict中,则数据帧的子集,例如:

  tempDict = {'var2':' b','var4':3} 

重要的是变量名不是预定义的,包含在dict中的变量是可变的。



我一直在困惑一段时间,所以任何建议将不胜感激。

解决方案

您可以使用 list comprehension 为每个条件创建掩码,然后通过转换为数据框并使用全部

 在[23]中:pd.DataFrame([tempDF [key] == val为key,val在tempDict.items()])。T.all(axis = 1)
Out [23]:
0 False
1 False
2 True
3 True
4 False
5 False
6 False
7 False
8 Fa lse
9 False
dtype:bool

然后你可以用数据框那个掩码:

  mask = pd.DataFrame([tempDF [key] == val for key,val in tempDict.items )])。T.all(axis = 1)

在[25]中:tempDF [mask]
输出[25]:
var1 var2 var3 var4
2 12 bf 3
3 12 bf 3


I have a Pandas dataframe that contains a large number of variables. This can be simplified as:

tempDF = pd.DataFrame({ 'var1': [12,12,12,12,45,45,45,51,51,51],
                        'var2': ['a','a','b','b','b','b','b','c','c','d'],
                        'var3': ['e','f','f','f','f','g','g','g','g','g'],
                        'var4': [1,2,3,3,4,5,6,6,6,7]})

If I wanted to select a subset of the dataframe (e.g. var2='b' and var4=3), I would use:

tempDF.loc[(tempDF['var2']=='b') & (tempDF['var4']==3),:]

However, is it possible to select a subset of the dataframe if the matching criteria are stored within a dict, such as:

tempDict = {'var2': 'b','var4': 3}

It's important that the variable names are not predefined and the number of variables included in the dict is changeable.

I've been puzzling over this for a while and so any suggestions would be greatly appreciated.

解决方案

You could create mask for each condition using list comprehension and then join them by converting to dataframe and using all:

In [23]: pd.DataFrame([tempDF[key] == val for key, val in tempDict.items()]).T.all(axis=1)
Out[23]:
0    False
1    False
2     True
3     True
4    False
5    False
6    False
7    False
8    False
9    False
dtype: bool

Then you could slice your dataframe with that mask:

mask = pd.DataFrame([tempDF[key] == val for key, val in tempDict.items()]).T.all(axis=1)

In [25]: tempDF[mask]
Out[25]:
   var1 var2 var3  var4
2    12    b    f     3
3    12    b    f     3

这篇关于根据存储在dict中的条件从Pandas数据帧中选择数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆