使用python pandas选择跨多个列? [英] selecting across multiple columns with python pandas?

查看:302
本文介绍了使用python pandas选择跨多个列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在pandas中有一个数据框 df ,它是使用csv文件中的 pandas.read_table 构建的。数据框架有多个列,它由一个列索引(这是唯一的,因为每一行都有用于索引的列的唯一值)。

I have a dataframe df in pandas that was built using pandas.read_table from a csv file. The dataframe has several columns and it is indexed by one of the columns (which is unique, in that each row has a unique value for that column used for indexing.)

如何基于应用于多个列的复杂过滤器来选择数据帧的行?我可以很容易地选择数据帧的切片,其中列 colA 大于10,例如:

How can I select rows of my dataframe based on a "complex" filter applied to multiple columns? I can easily select out the slice of the dataframe where column colA is greater than 10 for example:

df_greater_than10 = df[df["colA"] > 10]

但是如果我想要一个过滤器如:选择 df 其中任何列大于10?

But what if I wanted a filter like: select the slice of df where any of the columns are greater than 10?

colA 的值大于10,但 colB 小于5?

Or where the value for colA is greater than 10 but the value for colB is less than 5?

这些是如何在大熊猫中实现的?
感谢。

How are these implemented in pandas? Thanks.

推荐答案

我鼓励您在邮件列表,但在任何情况下,它仍然是一个非常低级别的事情使用底层NumPy数组。例如,要选择任何列中的值超过例如1.5的行:

I encourage you to pose these questions on the mailing list, but in any case, it's still a very much low level affair working with the underlying NumPy arrays. For example, to select rows where the value in any column exceed, say, 1.5 in this example:

In [11]: df
Out[11]: 
            A        B        C        D      
2000-01-03 -0.59885 -0.18141 -0.68828 -0.77572
2000-01-04  0.83935  0.15993  0.95911 -1.12959
2000-01-05  2.80215 -0.10858 -1.62114 -0.20170
2000-01-06  0.71670 -0.26707  1.36029  1.74254
2000-01-07 -0.45749  0.22750  0.46291 -0.58431
2000-01-10 -0.78702  0.44006 -0.36881 -0.13884
2000-01-11  0.79577 -0.09198  0.14119  0.02668
2000-01-12 -0.32297  0.62332  1.93595  0.78024
2000-01-13  1.74683 -1.57738 -0.02134  0.11596
2000-01-14 -0.55613  0.92145 -0.22832  1.56631
2000-01-17 -0.55233 -0.28859 -1.18190 -0.80723
2000-01-18  0.73274  0.24387  0.88146 -0.94490
2000-01-19  0.56644 -0.49321  1.17584 -0.17585
2000-01-20  1.56441  0.62331 -0.26904  0.11952
2000-01-21  0.61834  0.17463 -1.62439  0.99103
2000-01-24  0.86378 -0.68111 -0.15788 -0.16670
2000-01-25 -1.12230 -0.16128  1.20401  1.08945
2000-01-26 -0.63115  0.76077 -0.92795 -2.17118
2000-01-27  1.37620 -1.10618 -0.37411  0.73780
2000-01-28 -1.40276  1.98372  1.47096 -1.38043
2000-01-31  0.54769  0.44100 -0.52775  0.84497
2000-02-01  0.12443  0.32880 -0.71361  1.31778
2000-02-02 -0.28986 -0.63931  0.88333 -2.58943
2000-02-03  0.54408  1.17928 -0.26795 -0.51681
2000-02-04 -0.07068 -1.29168 -0.59877 -1.45639
2000-02-07 -0.65483 -0.29584 -0.02722  0.31270
2000-02-08 -0.18529 -0.18701 -0.59132 -1.15239
2000-02-09 -2.28496  0.36352  1.11596  0.02293
2000-02-10  0.51054  0.97249  1.74501  0.20525
2000-02-11  0.10100  0.27722  0.65843  1.73591

In [12]: df[(df.values > 1.5).any(1)]
Out[12]: 
            A       B       C        D     
2000-01-05  2.8021 -0.1086 -1.62114 -0.2017
2000-01-06  0.7167 -0.2671  1.36029  1.7425
2000-01-12 -0.3230  0.6233  1.93595  0.7802
2000-01-13  1.7468 -1.5774 -0.02134  0.1160
2000-01-14 -0.5561  0.9215 -0.22832  1.5663
2000-01-20  1.5644  0.6233 -0.26904  0.1195
2000-01-28 -1.4028  1.9837  1.47096 -1.3804
2000-02-10  0.5105  0.9725  1.74501  0.2052
2000-02-11  0.1010  0.2772  0.65843  1.7359

多个条件必须使用& | (和括号!):

Multiple conditions have to be combined using & or | (and parentheses!):

In [13]: df[(df['A'] > 1) | (df['B'] < -1)]
Out[13]: 
            A        B       C        D     
2000-01-05  2.80215 -0.1086 -1.62114 -0.2017
2000-01-13  1.74683 -1.5774 -0.02134  0.1160
2000-01-20  1.56441  0.6233 -0.26904  0.1195
2000-01-27  1.37620 -1.1062 -0.37411  0.7378
2000-02-04 -0.07068 -1.2917 -0.59877 -1.4564

我会非常感兴趣的有一些类型的查询API来做这些的事情更容易

I'd be very interested to have some kind of query API to make these kinds of things easier

这篇关于使用python pandas选择跨多个列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆