大 pandas 重新索引缺少日期 [英] pandas re-indexing with missing dates
问题描述
from dateutil.rrule import rrule, MONTHLY
def fread_year_month(strt_dt, end_dt):
dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
return dates
df = pd.DataFrame({
'value' : [4,2,5,6,7,8,6,5,4,1,2,4],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 12, 1)),
'stock': ['amzn']*12
},columns=[
'value', 'date', 'stock'] )
df2 = pd.DataFrame({
'value' : [1,1,1,1,1],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 5, 1)),
'stock': ['msft']*5
},columns=[
'value', 'date', 'stock'] )
df = df.append(df2)
df.set_index(['stock', 'date'], inplace=True)
我有上面的pandas数据框.如您所见,amzn的可用数据数量与msft不同.在此示例中,日期是连续的,但不必是这种情况(日期可以是任何日期).
I have the above pandas dataframe. As you can see the number of available data for amzn is not the same as msft. In this example the dates are sequential but it need not be the case (the dates can be any date).
如果可用日期的范围是我拥有AMZN数据的日期,那么如何在数据框中使用NaN或NA来添加所有其他股票的确切日期.
If the universe of available dates are the dates for which I have data for AMZN then how can I add those exact dates for every other stock in my data frame with a NaN or NA.
在给出的示例中,我想在索引中插入msft的缺失日期,并为这些日期索引的值插入NaN或NA.
In the example give, I want to insert the missing dates for msft in the index and insert NaN or NA for the value for those date indices.
推荐答案
如果您想将代码作为列使用,则可以执行unstack
,如下所示:
If you want to work with your tickers as columns, could do an unstack
, like this:
In [71]: df.unstack(level=0)
Out[71]:
value
stock amzn msft
date
2015-01-01 4.0 1.0
2015-02-01 2.0 1.0
2015-03-01 5.0 1.0
2015-04-01 6.0 1.0
2015-05-01 7.0 1.0
2015-06-01 8.0 NaN
2015-07-01 6.0 NaN
2015-08-01 5.0 NaN
2015-09-01 4.0 NaN
2015-10-01 1.0 NaN
2015-11-01 2.0 NaN
2015-12-01 4.0 NaN
要重新索引为相同形状,下面的from_product
会创建一个新的MultiIndex
,其中包含日期/股票的所有组合.
To reindex into the same shape, the from_product
below creates a new MultiIndex
with all the combinations of dates / tickers.
In [75]: df.reindex(pd.MultiIndex.from_product(df.index.levels))
Out[75]:
value
amzn 2015-01-01 4.0
2015-02-01 2.0
2015-03-01 5.0
2015-04-01 6.0
2015-05-01 7.0
2015-06-01 8.0
2015-07-01 6.0
2015-08-01 5.0
2015-09-01 4.0
2015-10-01 1.0
2015-11-01 2.0
2015-12-01 4.0
msft 2015-01-01 1.0
2015-02-01 1.0
2015-03-01 1.0
2015-04-01 1.0
2015-05-01 1.0
2015-06-01 NaN
2015-07-01 NaN
2015-08-01 NaN
2015-09-01 NaN
2015-10-01 NaN
2015-11-01 NaN
2015-12-01 NaN
这篇关于大 pandas 重新索引缺少日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!