使用python pandas选择跨多个列? [英] selecting across multiple columns with python pandas?
问题描述
我在pandas中有一个数据框 df
,它是使用csv文件中的 pandas.read_table
构建的。数据框架有多个列,它由一个列索引(这是唯一的,因为每一行都有用于索引的列的唯一值)。
I have a dataframe df
in pandas that was built using pandas.read_table
from a csv file. The dataframe has several columns and it is indexed by one of the columns (which is unique, in that each row has a unique value for that column used for indexing.)
如何基于应用于多个列的复杂过滤器来选择数据帧的行?我可以很容易地选择数据帧的切片,其中列 colA
大于10,例如:
How can I select rows of my dataframe based on a "complex" filter applied to multiple columns? I can easily select out the slice of the dataframe where column colA
is greater than 10 for example:
df_greater_than10 = df[df["colA"] > 10]
但是如果我想要一个过滤器如:选择 df
其中任何列大于10?
But what if I wanted a filter like: select the slice of df
where any of the columns are greater than 10?
或 colA
的值大于10,但 colB
小于5?
Or where the value for colA
is greater than 10 but the value for colB
is less than 5?
这些是如何在大熊猫中实现的?
感谢。
How are these implemented in pandas? Thanks.
推荐答案
我鼓励您在邮件列表,但在任何情况下,它仍然是一个非常低级别的事情使用底层NumPy数组。例如,要选择任何列中的值超过例如1.5的行:
I encourage you to pose these questions on the mailing list, but in any case, it's still a very much low level affair working with the underlying NumPy arrays. For example, to select rows where the value in any column exceed, say, 1.5 in this example:
In [11]: df
Out[11]:
A B C D
2000-01-03 -0.59885 -0.18141 -0.68828 -0.77572
2000-01-04 0.83935 0.15993 0.95911 -1.12959
2000-01-05 2.80215 -0.10858 -1.62114 -0.20170
2000-01-06 0.71670 -0.26707 1.36029 1.74254
2000-01-07 -0.45749 0.22750 0.46291 -0.58431
2000-01-10 -0.78702 0.44006 -0.36881 -0.13884
2000-01-11 0.79577 -0.09198 0.14119 0.02668
2000-01-12 -0.32297 0.62332 1.93595 0.78024
2000-01-13 1.74683 -1.57738 -0.02134 0.11596
2000-01-14 -0.55613 0.92145 -0.22832 1.56631
2000-01-17 -0.55233 -0.28859 -1.18190 -0.80723
2000-01-18 0.73274 0.24387 0.88146 -0.94490
2000-01-19 0.56644 -0.49321 1.17584 -0.17585
2000-01-20 1.56441 0.62331 -0.26904 0.11952
2000-01-21 0.61834 0.17463 -1.62439 0.99103
2000-01-24 0.86378 -0.68111 -0.15788 -0.16670
2000-01-25 -1.12230 -0.16128 1.20401 1.08945
2000-01-26 -0.63115 0.76077 -0.92795 -2.17118
2000-01-27 1.37620 -1.10618 -0.37411 0.73780
2000-01-28 -1.40276 1.98372 1.47096 -1.38043
2000-01-31 0.54769 0.44100 -0.52775 0.84497
2000-02-01 0.12443 0.32880 -0.71361 1.31778
2000-02-02 -0.28986 -0.63931 0.88333 -2.58943
2000-02-03 0.54408 1.17928 -0.26795 -0.51681
2000-02-04 -0.07068 -1.29168 -0.59877 -1.45639
2000-02-07 -0.65483 -0.29584 -0.02722 0.31270
2000-02-08 -0.18529 -0.18701 -0.59132 -1.15239
2000-02-09 -2.28496 0.36352 1.11596 0.02293
2000-02-10 0.51054 0.97249 1.74501 0.20525
2000-02-11 0.10100 0.27722 0.65843 1.73591
In [12]: df[(df.values > 1.5).any(1)]
Out[12]:
A B C D
2000-01-05 2.8021 -0.1086 -1.62114 -0.2017
2000-01-06 0.7167 -0.2671 1.36029 1.7425
2000-01-12 -0.3230 0.6233 1.93595 0.7802
2000-01-13 1.7468 -1.5774 -0.02134 0.1160
2000-01-14 -0.5561 0.9215 -0.22832 1.5663
2000-01-20 1.5644 0.6233 -0.26904 0.1195
2000-01-28 -1.4028 1.9837 1.47096 -1.3804
2000-02-10 0.5105 0.9725 1.74501 0.2052
2000-02-11 0.1010 0.2772 0.65843 1.7359
多个条件必须使用&
或 |
(和括号!):
Multiple conditions have to be combined using &
or |
(and parentheses!):
In [13]: df[(df['A'] > 1) | (df['B'] < -1)]
Out[13]:
A B C D
2000-01-05 2.80215 -0.1086 -1.62114 -0.2017
2000-01-13 1.74683 -1.5774 -0.02134 0.1160
2000-01-20 1.56441 0.6233 -0.26904 0.1195
2000-01-27 1.37620 -1.1062 -0.37411 0.7378
2000-02-04 -0.07068 -1.2917 -0.59877 -1.4564
我会非常感兴趣的有一些类型的查询API来做这些的事情更容易
I'd be very interested to have some kind of query API to make these kinds of things easier
这篇关于使用python pandas选择跨多个列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!