只有当索引是日期和时间时,如何基于时间选择 pandas 数据帧中的行 [英] How to select rows within a pandas dataframe based on time only when index is date and time

查看:99
本文介绍了只有当索引是日期和时间时,如何基于时间选择 pandas 数据帧中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框如下所示:

I have a dataframe that looks like this:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2016910 entries, 2009-01-02 04:51:00 to 2012-11-02 20:00:00
Freq: T
Data columns:
X1    2016910  non-null values
X2    2016910  non-null values
X3    2016910  non-null values
X4    2016910  non-null values
X5    2016910  non-null values
dtypes: float64(5)

我想通过访问整个过程只日期范围。例如,我想返回一个包含所有行的数据帧,时间在13:00:00和14:00:00之间,但是所有的日期。我正在从CSV文件读取数据,而datetime是一列,但我可以很方便地使输入CSV文件包含单独的日期和时间。我尝试了单独的日期和时间路线,并创建了一个多指数,但是当我这样做的时候,我结束了两个索引列 - 其中一个索引列包含正确的日期,时间不正确而不是日期,第二个包含一个不正确的日期,然后一个正确的时间,而不是一个时间。我的多指数尝试的输入数据如下所示:

and I would like to "filter" it by accessing only certain times across the whole range of dates. For example, I'd like to return a dataframe that contains all rows where the time is between 13:00:00 and 14:00:00, but for all of the dates. I am reading the data from a CSV file and the datetime is one column, but I could just as easily make the input CSV file contain a separate date and time. I tried the separate date and time route, and created a multiindex, but when I did, I ended up with two index columns -- one of them containing the proper date with an incorrect time instead of just a date, and the second one containing an incorrect date, and then a correct time, instead of just a time. The input data for my multiindex attempt looked like this:

 20090102,04:51:00,89.9900,89.9900,89.9900,89.9900,100
 20090102,05:36:00,90.0100,90.0100,90.0100,90.0100,200
 20090102,05:44:00,90.1400,90.1400,90.1400,90.1400,100
 20090102,05:50:00,90.0500,90.0500,90.0500,90.0500,500
 20090102,05:56:00,90.1000,90.1000,90.1000,90.1000,300
 20090102,05:57:00,90.1000,90.1000,90.1000,90.1000,200

我试图用这段代码阅读:

which I tried to read using this code:

 singledf = pd.DataFrame.from_csv("inputfile",header=None,index_col=[0,1],parse_dates=True)

导致数据框如下所示:

singledf.sort()
singledf

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 716244 entries, (<Timestamp: 2009-01-02 00:00:00>, <Timestamp: 2012-11-04      04:51:00>) to (<Timestamp: 2012-11-02 00:00:00>, <Timestamp: 2012-11-04 20:00:00>)
Data columns:
X2    716244  non-null values
X3    716244  non-null values
X4    716244  non-null values
X5    716244  non-null values
X6    716244  non-null values
dtypes: float64(4), int64(1)

也许多指标方法是完全错误的,但这是我尝试的一件事。看起来它是停留在使用datetime对象,并希望强制索引列有一个datetime而不是一个日期或一个时间。我的非多重索引尝试的源CSV文件如下所示:

Maybe the multiindex approach is totally wrong, but it's one thing I tried. It seems like it is stuck on using a datetime object, and wants to force the index columns to have a datetime instead of just a date or a time. My source CSV files for the my non-multiindex attempt looks like this:

20090102 04:51:00,89.9900,89.9900,89.9900,89.9900,100
20090102 05:36:00,90.0100,90.0100,90.0100,90.0100,200
20090102 05:44:00,90.1400,90.1400,90.1400,90.1400,100
20090102 05:50:00,90.0500,90.0500,90.0500,90.0500,500
20090102 05:56:00,90.1000,90.1000,90.1000,90.1000,300

我正在使用大熊猫。

推荐答案

常规DatetimeIndex允许使用between_time方法。

A regular DatetimeIndex allows to use between_time method.

In [12]: data = """\
 20090102,04:51:00,89.9900,89.9900,89.9900,89.9900,100
 20090102,05:36:00,90.0100,90.0100,90.0100,90.0100,200
 20090102,05:44:00,90.1400,90.1400,90.1400,90.1400,100
 20090102,05:50:00,90.0500,90.0500,90.0500,90.0500,500
 20090102,05:56:00,90.1000,90.1000,90.1000,90.1000,300
 20090102,05:57:00,90.1000,90.1000,90.1000,90.1000,200
"""

In [13]: singledf = pd.DataFrame.from_csv(StringIO(data), header=None, parse_dates=[[0,1]])

In [14]: singledf
Out[14]:
                        X2     X3     X4     X5   X6
X0_X1
2009-01-02 04:51:00  89.99  89.99  89.99  89.99  100
2009-01-02 05:36:00  90.01  90.01  90.01  90.01  200
2009-01-02 05:44:00  90.14  90.14  90.14  90.14  100
2009-01-02 05:50:00  90.05  90.05  90.05  90.05  500
2009-01-02 05:56:00  90.10  90.10  90.10  90.10  300
2009-01-02 05:57:00  90.10  90.10  90.10  90.10  200

In [15]: singledf.between_time('5:30:00', '5:45:00')
Out[15]:
                        X2     X3     X4     X5   X6
X0_X1
2009-01-02 05:36:00  90.01  90.01  90.01  90.01  200
2009-01-02 05:44:00  90.14  90.14  90.14  90.14  100

这篇关于只有当索引是日期和时间时,如何基于时间选择 pandas 数据帧中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆