将PANDAS数据框从每月转换为每天 [英] Converting PANDAS dataframe from monthly to daily
问题描述
我有一个数据框架,其中包含一系列317种股票行情的2014年月度数据(317种行情x 12个月= DF中的3,804行).我想将其转换为每日数据框(317行情x 365天= 115,705行).因此,我认为我需要在将每月的价值分布到每月的每一天的同时进行升采样或重新编制索引,但是我无法使其正常工作.
I have a data frame with monthly data for 2014 for a series of 317 stock tickers (317 tickers x 12 months = 3,804 rows in DF). I would like to convert it to a daily dataframe (317 tickers x 365 days = 115,705 rows). So, I believe I need to upsample or reindex while spreading the monthly values over every day in the month, but I can't get it to work properly.
数据框当前采用以下格式:
The dataframe is currently in this format:
>>> df
month ticker b c
2014-1 AAU 10 .04 #different values every month for each ticker
2014-2 AAU 20 .03
2014-3 AAU 13 .06
.
2014-12 AAU 11 .03
.
.
.
2014-1 ZZY 11 .11
2014-2 ZZY 6 .03
.
2014-12 ZZY 17 .09
这就是我想要的:
>>> df
day ticker b c
2014-01-01 AAU 10 .04 #same values every day in month for each ticker
2014-01-02 AAU 10 .04
2014-01-03 AAU 10 .04
.
2014-01-31 AAU 10 .04
2014-02-01 AAU 20 .03
2014-02-02 AAU 20 .03
.
2014-02-28 AAU 20 .03
.
.
.
2014-12-30 ZZY 17 .09
2014-12-31 ZZY 17 .09
我曾尝试进行分组和按天重新采样,但是更新后的数据帧将以日期"2014-01-13"而不是1月1日开始,以"2014-12-01"而不是12月31日结束.我还尝试将月份值从"2014-1"更改为"2014-01-01"等,但重新采样的数据框仍以"2014-01-01"结束.必须有一种更简单的方法来解决此问题,因此,我将不胜感激.我整天都在转转.
I have tried doing a groupby combined with resampling by day, but the updated dataframe will start with the date '2014-01-13' rather than January 1st, and end with '2014-12-01' rather than December 31st. I have also tried to change the month values from, for instance, '2014-1' to '2014-01-01', etc., but the resampled dataframe still ends on '2014-01-01'. There has to be an easier way to go about this, so I'd appreciate any help. I've been going around in circles all day on this.
推荐答案
首先,将month-datestrings解析为Pandas时间戳:
First, parse the month-datestrings into Pandas timestamps:
df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
# month ticker b c
# 0 2014-01-01 AAU 10 0.04
# 1 2014-02-01 AAU 20 0.03
# 2 2014-03-01 AAU 13 0.06
# 3 2014-12-01 AAU 11 0.03
# 4 2014-01-01 ZZY 11 0.11
# 5 2014-02-01 ZZY 6 0.03
# 6 2014-12-01 ZZY 17 0.09
接下来,使用月作为索引,将行情栏作为列级别来旋转数据框:
Next, pivot the DataFrame, using the month as the index and the ticker as a column level:
df = df.pivot(index='month', columns='ticker')
# b c
# ticker AAU ZZY AAU ZZY
# month
# 2014-01-01 10 11 0.04 0.11
# 2014-02-01 20 6 0.03 0.03
# 2014-03-01 13 NaN 0.06 NaN
# 2014-12-01 11 17 0.03 0.09
通过现在进行透视,我们以后可以更轻松地向前填充每一列.
By pivoting now, we will be able to forward-fill each column more easily later.
现在找到开始和结束日期:
Now find the start and end dates:
start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
有趣的是,请注意,添加pd.DateOffset(day=31)
并不总是会导致日期在第31天结束.如果月份是2月,则添加pd.DateOffset(day=31)
会返回2月的最后一天:
Interestingly, note that adding pd.DateOffset(day=31)
will not always result in a date that ends on day 31. If the month is February, adding pd.DateOffset(day=31)
returns the last day in February:
In [130]: pd.Timestamp('2014-2-28') + pd.DateOffset(day=31)
Out[130]: Timestamp('2014-02-28 00:00:00')
很好,因为这意味着添加pd.DateOffset(day=31)
将始终为我们提供该月的最后一个有效日期.
That's nice, since that means adding pd.DateOffset(day=31)
will always give us the last valid day in the month.
现在,我们可以重新索引并向前填充DataFrame:
Now we can reindex and forward-fill the DataFrame:
dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')
产生
In [160]: df.head()
Out[160]:
b c
ticker AAU ZZY AAU ZZY
date
2014-01-01 10 11 0.04 0.11
2014-01-02 10 11 0.04 0.11
2014-01-03 10 11 0.04 0.11
2014-01-04 10 11 0.04 0.11
2014-01-05 10 11 0.04 0.11
In [161]: df.tail()
Out[161]:
b c
ticker AAU ZZY AAU ZZY
date
2014-12-27 11 17 0.03 0.09
2014-12-28 11 17 0.03 0.09
2014-12-29 11 17 0.03 0.09
2014-12-30 11 17 0.03 0.09
2014-12-31 11 17 0.03 0.09
要将代码从列索引中移出,然后移回列中:
To move the ticker out of the column index and back into a column:
df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()
因此将它们放在一起:
So putting it all together:
import pandas as pd
df = pd.read_table('data', sep='\s+')
df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
df = df.pivot(index='month', columns='ticker')
start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')
df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()
收益
In [163]: df.head()
Out[163]:
date ticker b c
0 2014-01-01 AAU 10 0.04
1 2014-01-02 AAU 10 0.04
2 2014-01-03 AAU 10 0.04
3 2014-01-04 AAU 10 0.04
4 2014-01-05 AAU 10 0.04
In [164]: df.tail()
Out[164]:
date ticker b c
450 2014-12-27 ZZY 17 0.09
451 2014-12-28 ZZY 17 0.09
452 2014-12-29 ZZY 17 0.09
453 2014-12-30 ZZY 17 0.09
454 2014-12-31 ZZY 17 0.09
这篇关于将PANDAS数据框从每月转换为每天的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!