Pandas DataFrame获取索引符合特定条件的行 [英] Pandas DataFrame get rows where index matches a certain condition

查看:1845
本文介绍了Pandas DataFrame获取索引符合特定条件的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Pandas数据框,从中我需要索引匹配特定条件的所有行.数据框具有MultiIndex,我需要第一个索引TimeStamp在特定范围内的行. MultiIndex的级别1是一系列DateTime对象.下面的代码行用于检查月份是否等于5:

I have a Pandas dataframe from which I need all of the rows where the index matches a certain condition. The dataframe has a MultiIndex, and I need the rows where the first index, the TimeStamp, is in a specific range. Level 1 of the MultiIndex is a Series of DateTime objects. This following line of code works to check if the month is equal to 5:

compare[compare.index.get_level_values(0).month == 5]

但是当我修改代码以检查值在特定数组中的行时

But when I modify the code to check for the rows where the value is in a certain array

compare[compare.index.get_level_values(0).month in [5, 6, 7]]

我得到了错误

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我还尝试使用df.loc来获取值.

I also tried using df.loc to get the values.

compare.loc[compare.index.get_level_values(0).month in [5, 6, 7]]

但这会导致相同的错误.

But this results in the same error.

我也尝试使用isin方法.

compare[compare.index.get_level_values(0).month.isin([5, 6, 7])]

但这会导致以下属性错误:

But this results in the following attribute error:

AttributeError: 'numpy.ndarray' object has no attribute 'isin'

如何获取索引满足特定条件的DataFrame行?

How do I get the rows of the DataFrame where the index meets a specific condition?

推荐答案

尝试一下:

compare[compare.index.get_level_values(0).month.isin([5, 6, 7])]

PS 这适用于熊猫0.18版. 1+

演示:

In [45]: import pandas_datareader.data as web

In [46]: df = web.DataReader('AAPL', 'google', '2017-06-01')

In [48]: df = df.assign(i2=np.arange(len(df))).set_index('i2', append=True)

In [49]: df
Out[49]:
                 Open    High     Low   Close    Volume
Date       i2
2017-06-01 0   153.17  153.33  152.22  153.18  16404088
2017-06-02 1   153.58  155.45  152.89  155.45  27770715
2017-06-05 2   154.34  154.45  153.46  153.93  25331662
2017-06-06 3   153.90  155.81  153.78  154.45  26624926
2017-06-07 4   155.02  155.98  154.48  155.37  21069647
2017-06-08 5   155.25  155.54  154.40  154.99  21250798
2017-06-09 6   155.19  155.19  146.02  148.98  64882657
2017-06-12 7   145.74  146.09  142.51  145.42  72307330
2017-06-13 8   147.16  147.45  145.15  146.59  34165445
2017-06-14 9   147.50  147.50  143.84  145.16  31531232
...               ...     ...     ...     ...       ...
2017-07-31 41  149.90  150.33  148.13  148.73  19845920
2017-08-01 42  149.10  150.22  148.41  150.05  35368645
2017-08-02 43  159.28  159.75  156.16  157.14  69936800
2017-08-03 44  157.05  157.21  155.02  155.57  27097296
2017-08-04 45  156.07  157.40  155.69  156.39  20559852
2017-08-07 46  157.06  158.92  156.67  158.81  21870321
2017-08-08 47  158.60  161.83  158.27  160.08  36205896
2017-08-09 48  159.26  161.27  159.11  161.06  26131530
2017-08-10 49  159.90  160.00  154.63  155.32  40804273
2017-08-11 50  156.60  158.57  156.07  157.48  26180743

[51 rows x 5 columns]

In [50]: df[df.index.get_level_values(0).month.isin([5,8])]
Out[50]:
                 Open    High     Low   Close    Volume
Date       i2
2017-08-01 42  149.10  150.22  148.41  150.05  35368645
2017-08-02 43  159.28  159.75  156.16  157.14  69936800
2017-08-03 44  157.05  157.21  155.02  155.57  27097296
2017-08-04 45  156.07  157.40  155.69  156.39  20559852
2017-08-07 46  157.06  158.92  156.67  158.81  21870321
2017-08-08 47  158.60  161.83  158.27  160.08  36205896
2017-08-09 48  159.26  161.27  159.11  161.06  26131530
2017-08-10 49  159.90  160.00  154.63  155.32  40804273
2017-08-11 50  156.60  158.57  156.07  157.48  26180743

更新:使用索引值进行测试:

UPDATE: test with your index values:

In [56]: i = pd.DatetimeIndex(['2016-01-04 01:40:00', '2016-01-04 02:00:00', '2016-01-04 02:10:00', '2016-01-04 02:30:00', '2016-01-04 02:4
    ...: 0:00'], dtype='datetime64[ns]', name=u'TTimeStamp', freq=None)

In [57]: i
Out[57]: DatetimeIndex(['2016-01-04 01:40:00', '2016-01-04 02:00:00', '2016-01-04 02:10:00', '2016-01-04 02:30:00', '2016-01-04 02:40:00'],
dtype='datetime64[ns]', name='TTimeStamp', freq=None)

In [58]: i.month
Out[58]: Int64Index([1, 1, 1, 1, 1], dtype='int64', name='TTimeStamp')

In [59]: i.month.isin([2,3])
Out[59]: array([False, False, False, False, False], dtype=bool)

In [60]: i.month.isin([1,2,3])
Out[60]: array([ True,  True,  True,  True,  True], dtype=bool)

UPDATE2::尝试以下解决方法:

UPDATE2: try the following workaround:

compare[pd.Series(compare.index.get_level_values(0).month).isin([5, 6, 7]).values]

这篇关于Pandas DataFrame获取索引符合特定条件的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆