使用布尔序列/数组从 pandas 数据框中选择 [英] Select from pandas dataframe using boolean series/array

查看:109
本文介绍了使用布尔序列/数组从 pandas 数据框中选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框:

             High    Low  Close
Date                           
2009-02-11  30.20  29.41  29.87
2009-02-12  30.28  29.32  30.24
2009-02-13  30.45  29.96  30.10
2009-02-17  29.35  28.74  28.90
2009-02-18  29.35  28.56  28.92

和一个布尔序列:

     bools
1    True
2    False
3    False
4    True
5    False

我如何使用布尔数组从数据框中选择结果,如下所示:

how could I select from the dataframe using the boolean array to obtain result like:

             High   
Date                           
2009-02-11  30.20  
2009-02-17  29.35  


推荐答案

要使索引与两个DataFrame一起使用,它们必须具有可比较的索引。在这种情况下,它将不起作用,因为一个 DataFrame 具有整数索引,而另一个具有日期。

For the indexing to work with two DataFrames they have to have comparable indexes. In this case it won't work because one DataFrame has an integer index, while the other has dates.

但是,正如您所说,您可以使用 bool 数组进行过滤 。您可以通过 .values 访问 Series 数组。然后可以将其用作过滤器,如下所示:

However, as you say you can filter using a bool array. You can access the array for a Series via .values. This can be then applied as a filter as follows:

df # pandas.DataFrame
s  # pandas.Series 

df[s.values] # df, filtered by the bool array in s

例如,使用您的数据

import pandas as pd

df = pd.DataFrame([
            [30.20,  29.41,  29.87],
            [30.28,  29.32,  30.24],
            [30.45,  29.96,  30.10],
            [29.35,  28.74,  28.90],
            [29.35,  28.56,  28.92],
        ],
        columns=['High','Low','Close'], 
        index=['2009-02-11','2009-02-12','2009-02-13','2009-02-17','2009-02-18']
        )

s = pd.Series([True, False, False, True, False], name='bools')

df[s.values]

返回以下内容:

            High    Low     Close
2009-02-11  30.20   29.41   29.87
2009-02-17  29.35   28.74   28.90

如果只需要High列,则可以按常规过滤(在 bool之前或之后) 过滤器):

If you just want the High column, you can filter this as normal (before, or after the bool filter):

df['High'][s.values]
# Or: df[s.values]['High']

获取目标输出(作为系列):

To get your target output (as a Series):

 2009-02-11    30.20
 2009-02-17    29.35
 Name: High, dtype: float64

这篇关于使用布尔序列/数组从 pandas 数据框中选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆