只有当索引是日期和时间时,如何基于时间选择 pandas 数据帧中的行 [英] How to select rows within a pandas dataframe based on time only when index is date and time
问题描述
我有一个数据框如下所示:
I have a dataframe that looks like this:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2016910 entries, 2009-01-02 04:51:00 to 2012-11-02 20:00:00
Freq: T
Data columns:
X1 2016910 non-null values
X2 2016910 non-null values
X3 2016910 non-null values
X4 2016910 non-null values
X5 2016910 non-null values
dtypes: float64(5)
我想通过访问整个过程只日期范围。例如,我想返回一个包含所有行的数据帧,时间在13:00:00和14:00:00之间,但是所有的日期。我正在从CSV文件读取数据,而datetime是一列,但我可以很方便地使输入CSV文件包含单独的日期和时间。我尝试了单独的日期和时间路线,并创建了一个多指数,但是当我这样做的时候,我结束了两个索引列 - 其中一个索引列包含正确的日期,时间不正确而不是日期,第二个包含一个不正确的日期,然后一个正确的时间,而不是一个时间。我的多指数尝试的输入数据如下所示:
and I would like to "filter" it by accessing only certain times across the whole range of dates. For example, I'd like to return a dataframe that contains all rows where the time is between 13:00:00 and 14:00:00, but for all of the dates. I am reading the data from a CSV file and the datetime is one column, but I could just as easily make the input CSV file contain a separate date and time. I tried the separate date and time route, and created a multiindex, but when I did, I ended up with two index columns -- one of them containing the proper date with an incorrect time instead of just a date, and the second one containing an incorrect date, and then a correct time, instead of just a time. The input data for my multiindex attempt looked like this:
20090102,04:51:00,89.9900,89.9900,89.9900,89.9900,100
20090102,05:36:00,90.0100,90.0100,90.0100,90.0100,200
20090102,05:44:00,90.1400,90.1400,90.1400,90.1400,100
20090102,05:50:00,90.0500,90.0500,90.0500,90.0500,500
20090102,05:56:00,90.1000,90.1000,90.1000,90.1000,300
20090102,05:57:00,90.1000,90.1000,90.1000,90.1000,200
我试图用这段代码阅读:
which I tried to read using this code:
singledf = pd.DataFrame.from_csv("inputfile",header=None,index_col=[0,1],parse_dates=True)
导致数据框如下所示:
singledf.sort()
singledf
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 716244 entries, (<Timestamp: 2009-01-02 00:00:00>, <Timestamp: 2012-11-04 04:51:00>) to (<Timestamp: 2012-11-02 00:00:00>, <Timestamp: 2012-11-04 20:00:00>)
Data columns:
X2 716244 non-null values
X3 716244 non-null values
X4 716244 non-null values
X5 716244 non-null values
X6 716244 non-null values
dtypes: float64(4), int64(1)
也许多指标方法是完全错误的,但这是我尝试的一件事。看起来它是停留在使用datetime对象,并希望强制索引列有一个datetime而不是一个日期或一个时间。我的非多重索引尝试的源CSV文件如下所示:
Maybe the multiindex approach is totally wrong, but it's one thing I tried. It seems like it is stuck on using a datetime object, and wants to force the index columns to have a datetime instead of just a date or a time. My source CSV files for the my non-multiindex attempt looks like this:
20090102 04:51:00,89.9900,89.9900,89.9900,89.9900,100
20090102 05:36:00,90.0100,90.0100,90.0100,90.0100,200
20090102 05:44:00,90.1400,90.1400,90.1400,90.1400,100
20090102 05:50:00,90.0500,90.0500,90.0500,90.0500,500
20090102 05:56:00,90.1000,90.1000,90.1000,90.1000,300
我正在使用大熊猫。
推荐答案
常规DatetimeIndex允许使用between_time方法。
A regular DatetimeIndex allows to use between_time method.
In [12]: data = """\
20090102,04:51:00,89.9900,89.9900,89.9900,89.9900,100
20090102,05:36:00,90.0100,90.0100,90.0100,90.0100,200
20090102,05:44:00,90.1400,90.1400,90.1400,90.1400,100
20090102,05:50:00,90.0500,90.0500,90.0500,90.0500,500
20090102,05:56:00,90.1000,90.1000,90.1000,90.1000,300
20090102,05:57:00,90.1000,90.1000,90.1000,90.1000,200
"""
In [13]: singledf = pd.DataFrame.from_csv(StringIO(data), header=None, parse_dates=[[0,1]])
In [14]: singledf
Out[14]:
X2 X3 X4 X5 X6
X0_X1
2009-01-02 04:51:00 89.99 89.99 89.99 89.99 100
2009-01-02 05:36:00 90.01 90.01 90.01 90.01 200
2009-01-02 05:44:00 90.14 90.14 90.14 90.14 100
2009-01-02 05:50:00 90.05 90.05 90.05 90.05 500
2009-01-02 05:56:00 90.10 90.10 90.10 90.10 300
2009-01-02 05:57:00 90.10 90.10 90.10 90.10 200
In [15]: singledf.between_time('5:30:00', '5:45:00')
Out[15]:
X2 X3 X4 X5 X6
X0_X1
2009-01-02 05:36:00 90.01 90.01 90.01 90.01 200
2009-01-02 05:44:00 90.14 90.14 90.14 90.14 100
这篇关于只有当索引是日期和时间时,如何基于时间选择 pandas 数据帧中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!