使用Python/Pandas提取时间序列中的工作日 [英] Extract Business Days in Time Series using Python/Pandas

查看:526
本文介绍了使用Python/Pandas提取时间序列中的工作日的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用时间序列"中的高频数据,并且我希望从我的数据中获取所有工作日.我的数据观察值以秒为单位,因此每天有86400秒,我的数据集分布在31天之内(因此有2678400个观察值!).

I am working with high frequency data in Time Series and I would like to get all the business days from my data. My data observations are separated by seconds, so there are 86400 seconds each day and my data set are spread over 31 days (so there are 2,678,400 observations!).

这是我的数据的一部分:

Here is (part) of my data:

In[1]: ts
Out[1]: 
2013-01-01 00:00:00    0.480928
2013-01-01 00:00:01    0.480928
2013-01-01 00:00:02    0.483977
2013-01-01 00:00:03    0.486725
2013-01-01 00:00:04    0.486725
...
2013-01-31 23:59:56    0.451630
2013-01-31 23:59:57    0.451630
2013-01-31 23:59:58    0.451630
2013-01-31 23:59:59    0.454683
Freq: S, Length: 2678400

我要做的是创建一个新的时间序列,其中包含本月的工作日,但是我希望它们具有相应的数据秒. 例如,如果2013年1月1日(星期三)至2013年1月4日(星期五)是一月第一周的第一工作日,则:

What I would like to do is to create a new time series which consists of the business days from this month, but I would like to have them with their corresponding data seconds. For example, if 2013-01-02 (WED) until 2013-01-04 (Fri) are the first business days of the first week in January, then:

2013-01-02 00:00:00    0.507477
2013-01-02 00:00:01    0.501373
...
2013-01-03 00:00:00    0.489778
2013-01-03 00:00:01    0.489778
...
2013-01-04 23:59:58    0.598115
2013-01-04 23:59:59    0.598115
Freq: S, Length: 259200

因此,由于这些时间是周末,因此它将当然排除2013年1月5日和2013年1月6日星期六的所有数据. 等等...

so it will exclude of course all the data on Sat 2013-01-05 and 2013-01-06 since these are the weekend days. and so on...

我尝试使用一些内置的pandas命令,但由于它们是按日汇总的,因此没有找到正确的命令,而没有考虑到每天其中都包含子列.也就是说,每一秒都有一个值,不应将它们取平均值,而应将其分组为一个新的序列.

I tried to use some pandas built-in commands, but couldn't find the right one since they aggregate by day without taking into consideration that each day contains sub columns in them. That is, for each second there is a value and they should not be averaged, just grouped together to a new series..

例如,我尝试过:

  1. ts.asfreq(BDay())->查找工作日,但每天的平均值
  2. ts.resample()->您必须定义方式"(平均值,最大值,最小值...)
  3. ts.groupby(lambda x : x.weekday)->也不行!
  4. ts = pd.Series(df, index = pd.bdate_range(start = '2013/01/01 00:00:00', end = '2013/01/31 23:59:59' , freq = 'S')) -> df,因为原始数据为DataFramem. 使用pd.bdate_range没有帮助,因为df和index必须在同一维上.
  1. ts.asfreq(BDay()) --> finds the business day but averages over each day
  2. ts.resample() --> you must define 'how' (mean, max, min...)
  3. ts.groupby(lambda x : x.weekday) --> not either!
  4. ts = pd.Series(df, index = pd.bdate_range(start = '2013/01/01 00:00:00', end = '2013/01/31 23:59:59' , freq = 'S')) --> df since the original data is as DataFramem. Using pd.bdate_range didn't help since df and index must be in the same dimension..

我在熊猫文档中进行了搜索,但用谷歌搜索却找不到线索...
有人有主意吗?

I searched in pandas documentation, googled but could not find a clue...
Does anybody have an idea?

非常感谢您的帮助!

谢谢!

p.s 我宁愿不使用循环,因为我的数据集非常大... (我还有其他几个月要分析)

p.s I would rather not use loops for that, since my data set is very large... (I have also other months to analyse)

推荐答案

不幸的是,这有点慢,但至少应该给出您要寻找的答案.

Unfortunately this is a little slow, but should at least give the answer you are looking for.

#create an index of just the date portion of your index (this is the slow step)
ts_days = pd.to_datetime(ts.index.date)

#create a range of business days over that period
bdays = pd.bdate_range(start=ts.index[0].date(), end=ts.index[-1].date())

#Filter the series to just those days contained in the business day range.
ts = ts[ts_days.isin(bdays)]

这篇关于使用Python/Pandas提取时间序列中的工作日的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆