子集 pandas 数据框的最佳方法 [英] Best way to subset a pandas dataframe

查看：65 发布时间：2020/5/24 1:52:56 python pandas dataframe data-science

本文介绍了子集 pandas 数据框的最佳方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

嘿，我是Pandas的新手，我刚遇到df.query().

Hey I'm new to Pandas and I just came across df.query().

当您可以使用方括号表示法直接过滤数据框时，为什么人们会使用df.query()?官方的熊猫教程似乎也更喜欢后一种方法.

Why people would use df.query() when you can directly filter your Dataframes using brackets notation ? The official pandas tutorial also seems to prefer the latter approach.

带有方括号表示法:

df[df['age'] <= 21]

使用熊猫查询方法:

df.query('age <= 21')

除了已经提到的某些样式或灵活性差异之外，一种规范上的首选是更好的选择-即在大型数据帧上执行操作时是否能做到这一点?

Besides some of the stylistic or flexibility differences that have been mentioned, is one canonically preferred - namely for performance of operations on large dataframes?

推荐答案

请考虑以下示例DF:

In [307]: df
Out[307]:
  sex  age     name
0   M   40      Max
1   F   35     Anna
2   M   29      Joe
3   F   18    Maria
4   F   23  Natalie

有很多很好的理由偏爱.query()方法.

There are quite a few good reasons to prefer .query() method.

与布尔索引相比，它可能更短，更简洁:

it might be much shorter and cleaner compared to boolean indexing:

In [308]: df.query("20 <= age <= 30 and sex=='F'")
Out[308]:
  sex  age     name
4   F   23  Natalie

In [309]: df[(df['age']>=20) & (df['age']<=30) & (df['sex']=='F')]
Out[309]:
  sex  age     name
4   F   23  Natalie

您可以以编程方式准备条件(查询):

you can prepare conditions (queries) programmatically:

In [315]: conditions = {'name':'Joe', 'sex':'M'}

In [316]: q = ' and '.join(['{}=="{}"'.format(k,v) for k,v in conditions.items()])

In [317]: q
Out[317]: 'name=="Joe" and sex=="M"'

In [318]: df.query(q)
Out[318]:
  sex  age name
2   M   29  Joe

PS还有一些缺点:

对于包含空格或仅由数字组成的列的列，我们不能使用.query()方法
并非所有功能都可以应用，或者在某些情况下，我们必须使用engine='python'代替默认的engine='numexpr'(更快)

we can't use .query() method for columns containing spaces or columns that consist only from digits
not all functions can be applied or in some cases we have to use engine='python' instead of default engine='numexpr' (which is faster)

注意:Jeff(熊猫的主要贡献者之一，也是熊猫核心团队的成员)

NOTE: Jeff (one of the main Pandas contributors and a member of Pandas core team) once said:

请注意，实际上.query只是一个不错的界面，实际上它有非常具体的保证，这意味着它的解析方式像查询语言，而不是完全通用的界面.

Note that in reality .query is just a nice-to-have interface, in fact it has very specific guarantees, meaning its meant to parse like a query language, and not a fully general interface.

这篇关于子集 pandas 数据框的最佳方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

子集 pandas 数据框的最佳方法 [英] Best way to subset a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

子集 pandas 数据框的最佳方法 [英] Best way to subset a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭