动态过滤 pandas 数据框 [英] Dynamically filtering a pandas dataframe

查看:72
本文介绍了动态过滤 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用三列阈值来过滤熊猫数据框

I am trying to filter a pandas data frame using thresholds for three columns

import pandas as pd
df = pd.DataFrame({"A" : [6, 2, 10, -5, 3],
                   "B" : [2, 5, 3, 2, 6],
                   "C" : [-5, 2, 1, 8, 2]})
df = df.loc[(df.A > 0) & (df.B > 2) & (df.C > -1)].reset_index(drop = True)

df
    A  B  C
0   2  5  2
1  10  3  1
2   3  6  2

但是,我想在一个函数中执行此操作,在字典中将列名及其阈值提供给我.这是我的第一次尝试,可以.本质上,我将过滤器放在cond变量中,然后运行它:

However, I want to do this inside a function where the names of the columns and their thresholds are given to me in a dictionary. Here's my first try that works ok. Essentially I am putting the filter inside cond variable and just run it:

df = pd.DataFrame({"A" : [6, 2, 10, -5, 3],
                   "B" : [2, 5, 3, 2, 6],
                   "C" : [-5, 2, 1, 8, 2]})
limits_dic = {"A" : 0, "B" : 2, "C" : -1}
cond = "df = df.loc["
for key in limits_dic.keys():
    cond += "(df." + key + " > " + str(limits_dic[key])+ ") & "
cond = cond[:-2] + "].reset_index(drop = True)"
exec(cond)
df
    A  B  C
0   2  5  2
1  10  3  1
2   3  6  2

现在,最后我将所有内容放到一个函数中,并且它停止工作(也许exec函数不喜欢在一个函数中使用!):

Now, finally I put everything inside a function and it stops working (perhaps exec function does not like to be used inside a function!):

df = pd.DataFrame({"A" : [6, 2, 10, -5, 3],
                   "B" : [2, 5, 3, 2, 6],
                   "C" : [-5, 2, 1, 8, 2]})
limits_dic = {"A" : 0, "B" : 2, "C" : -1}
def filtering(df, limits_dic):
    cond = "df = df.loc["
    for key in limits_dic.keys():
        cond += "(df." + key + " > " + str(limits_dic[key])+ ") & "
    cond = cond[:-2] + "].reset_index(drop = True)"
    exec(cond)
    return(df)

df = filtering(df, limits_dic)
df
    A  B  C
0   6  2 -5
1   2  5  2
2  10  3  1
3  -5  2  8
4   3  6  2

我知道exec函数在函数内部使用时的行为不同,但不确定如何解决该问题.另外,我想知道是否必须有一种更优雅的方法来定义一个函数,以在给定两个输入的情况下进行过滤:1)df和2)limits_dic = {"A" : 0, "B" : 2, "C" : -1}.我对此表示感谢.

I know that exec function acts differently when used inside a function but was not sure how to address the problem. Also, I am wondering there must be a more elegant way to define a function to do the filtering given two input: 1)df and 2)limits_dic = {"A" : 0, "B" : 2, "C" : -1}. I would appreciate any thoughts on this.

推荐答案

如果您要构建动态查询,则有更简便的方法.这是使用列表理解和str.join的一个:

If you're trying to build a dynamic query, there are easier ways. Here's one using a list comprehension and str.join:

query = ' & '.join(['{}>{}'.format(k, v) for k, v in limits_dic.items()])

或者,将f -strings与python-3.6 +一起使用

Or, using f-strings with python-3.6+,

query = ' & '.join([f'{k}>{v}' for k, v in limits_dic.items()])

print(query)

'A>0 & C>-1 & B>2'

将查询字符串传递给df.query,这就是为了这个目的:

Pass the query string to df.query, it's meant for this very purpose:

out = df.query(query)
print(out)

    A  B  C
1   2  5  2
2  10  3  1
4   3  6  2


如果要为查询获取布尔掩码,也可以使用df.eval,然后在此之后索引将变得很简单:


You could also use df.eval if you want to obtain a boolean mask for your query, and then indexing becomes straightforward after that:

mask = df.eval(query)
print(mask)

0    False
1     True
2     True
3    False
4     True
dtype: bool

out = df[mask]
print(out)

    A  B  C
1   2  5  2
2  10  3  1
4   3  6  2


字符串数据

如果您需要查询使用字符串数据的列,则上面的代码将需要稍作修改.


String Data

If you need to query columns that use string data, the code above will need a slight modification.

考虑(来自此答案的数据):

df = pd.DataFrame({'gender':list('MMMFFF'),
                   'height':[4,5,4,5,5,4],
                   'age':[70,80,90,40,2,3]})

print (df)
  gender  height  age
0      M       4   70
1      M       5   80
2      M       4   90
3      F       5   40
4      F       5    2
5      F       4    3

以及列,运算符和值的列表:

And a list of columns, operators, and values:

column = ['height', 'age', 'gender']
equal = ['>', '>', '==']
condition = [1.68, 20, 'F']

此处的适当修改是:

query = ' & '.join(f'{i} {j} {repr(k)}' for i, j, k in zip(column, equal, condition))
df.query(query)

   age gender  height
3   40      F       5


有关pd.eval()功能家族,其功能和使用案例的信息,请访问


For information on the pd.eval() family of functions, their features and use cases, please visit Dynamic Expression Evaluation in pandas using pd.eval().

这篇关于动态过滤 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆