pandas DataFrame.query表达式,默认情况下返回所有行 [英] pandas DataFrame.query expression that returns all rows by default

查看:79
本文介绍了 pandas DataFrame.query表达式,默认情况下返回所有行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了pandas DataFrame.query方法,它几乎完全满足了我的需要(并实现了自己的解析器,因为我还没有意识到它的存在,但实际上我应该使用标准方法).

I have discovered the pandas DataFrame.query method and it almost does exactly what I needed it to (and implemented my own parser for, since I hadn't realized it existed but really I should be using the standard method).

我希望我的用户能够在配置文件中指定查询.该语法似乎足够直观,以至于我可以期望非程序员(但工程师)用户理解它.

I would like my users to be able to specify the query in a configuration file. The syntax seems intuitive enough that I can expect my non-programmer (but engineer) users to figure it out.

缺少一件事:一种选择数据框中所有内容的方法.有时,我的用户想要使用的是每一行,因此他们会将全部"或其他内容放入该配置选项.实际上,这将是默认选项.

There's just one thing missing: a way to select everything in the dataframe. Sometimes what my users want to use is every row, so they would put 'All' or something into that configuration option. In fact, that will be the default option.

我尝试了df.query('True'),但引发了KeyError.我尝试了df.query('1'),但是返回了索引为1的行.空字符串引发ValueError.

I tried df.query('True') but that raised a KeyError. I tried df.query('1') but that returned the row with index 1. The empty string raised a ValueError.

我唯一能想到的是1)每当我需要执行这种类型的查询时(在代码中可能进行3或4次)或在2)子类DataFrame的子类中添加一个if子句,或者重新实现查询,或者添加一个query_with_all方法:

The only things I can think of are 1) put an if clause every time I need to do this type of query (probably 3 or 4 times in the code) or 2) subclass DataFrame and either reimplement query, or add a query_with_all method:

import pandas as pd

class MyDataFrame(pd.DataFrame):
    def query_with_all(self, query_string):
        if query_string.lower() == 'all':
            return self
        else:
            return self.query(query_string)

然后每次使用我自己的类而不是熊猫类.这是唯一的方法吗?

And then use my own class every time instead of the pandas one. Is this the only way to do this?

推荐答案

使事情简单,并使用函数:

Keep things simple, and use a function:

def query_with_all(data_frame, query_string):
    if query_string == "all":
        return data_frame
    return data_frame.query(query_string)

每当需要使用这种类型的查询时,只需使用数据框和查询字符串调用该函数.无需使用任何额外的if语句或子类pd.Dataframe.

Whenever you need to use this type of query, just call the function with the data frame and the query string. There's no need to use any extra if statements or subclass pd.Dataframe.

如果仅限使用df.query,则可以使用全局变量

If you're restricted to using df.query, you can use a global variable

ALL = slice(None)
df.query('@ALL', engine='python')

如果您不允许使用全局变量,并且您的DataFrame不是MultiIndexed,则可以使用

If you're not allowed to use global variables, and if your DataFrame isn't MultiIndexed, you can use

df.query('tuple()')

所有这些属性将处理NaN值.

All of these will property handle NaN values.

这篇关于 pandas DataFrame.query表达式,默认情况下返回所有行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆