在pandas DataFrame / Series中快速选择时间间隔 [英] Fast selection of a time interval in a pandas DataFrame/Series

查看:1456
本文介绍了在pandas DataFrame / Series中快速选择时间间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是我想过滤一个DataFrame只包含 [start,end] 区间内的时间。如果不关心这一天,我想过滤每天的开始和结束时间。我有一个解决方案,但它很慢。所以我的问题是如果有更快的方法来进行基于时间的过滤。

my problem is that I want to filter a DataFrame to only include times within the interval [start, end) . If do not care about the day, I would like to filter only for start and end time for each day. I have a solution for this but it is slow. So my question is if there is a faster way to do the time based filtering.

示例

import pandas as pd
import time


index=pd.date_range(start='2012-11-05 01:00:00', end='2012-11-05 23:00:00', freq='1S').tz_localize('UTC')
df=pd.DataFrame(range(len(index)), index=index, columns=['Number'])

# select from 1 to 2 am, include day
now=time.time()
df2=df.ix['2012-11-05 01:00:00':'2012-11-05 02:00:00']
print 'Took %s seconds' %(time.time()-now) #0.0368609428406

# select from 1 to 2 am, for every day
now=time.time()
selector=(df.index.hour>=1) & (df.index.hour<2)
df3=df[selector]
print 'Took %s seconds' %(time.time()-now) #Took  0.0699911117554

如你所知,如果我删除当天(第二种情况)它几乎需要两倍。如果我有很多不同的日子,例如11月5日到7日,计算时间会迅速增加:

As you can see if I remove the day (second case) it takes almost twice as much. The computation time increases rapidly if I have a number of different days, e.g from 5 to 7 Nov:

index=pd.date_range(start='2012-11-05 01:00:00', end='2012-11-07 23:00:00', freq='1S').tz_localize('UTC')

那么,总结一下,有多快的方法可以在一天中按时间过滤?

So, to summarize is there a faster method to filter by time of the day, across many days?

Thx

推荐答案

您需要 between_time 方法。

In [14]: %timeit df.between_time(start_time='01:00', end_time='02:00')
100 loops, best of 3: 10.2 ms per loop

In [15]: %timeit selector=(df.index.hour>=1) & (df.index.hour<2); df[selector]
100 loops, best of 3: 18.2 ms per loop

我有11月5日至7日完成这些测试作为索引。

I had done these tests with 5th to 7th November as index.


Definition: df.between_time(self, start_time, end_time, include_start=True, include_end=True)
Docstring:
Select values between particular times of the day (e.g., 9:00-9:30 AM)

Parameters
----------
start_time : datetime.time or string
end_time : datetime.time or string
include_start : boolean, default True
include_end : boolean, default True

Returns
-------
values_between_time : type of caller

这篇关于在pandas DataFrame / Series中快速选择时间间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆