大 pandas 重新索引缺少日期 [英] pandas re-indexing with missing dates

查看:94
本文介绍了大 pandas 重新索引缺少日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

from dateutil.rrule import rrule, MONTHLY

def fread_year_month(strt_dt, end_dt):
        dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
        return dates

df = pd.DataFrame({
'value' : [4,2,5,6,7,8,6,5,4,1,2,4],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 12, 1)),
'stock': ['amzn']*12
},columns=[
'value', 'date', 'stock'] )

df2 = pd.DataFrame({
'value' : [1,1,1,1,1],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 5, 1)),
'stock': ['msft']*5
},columns=[
'value', 'date', 'stock'] )

df = df.append(df2)

df.set_index(['stock', 'date'], inplace=True)

我有上面的pandas数据框.如您所见,amzn的可用数据数量与msft不同.在此示例中,日期是连续的,但不必是这种情况(日期可以是任何日期).

I have the above pandas dataframe. As you can see the number of available data for amzn is not the same as msft. In this example the dates are sequential but it need not be the case (the dates can be any date).

如果可用日期的范围是我拥有AMZN数据的日期,那么如何在数据框中使用NaN或NA来添加所有其他股票的确切日期.

If the universe of available dates are the dates for which I have data for AMZN then how can I add those exact dates for every other stock in my data frame with a NaN or NA.

在给出的示例中,我想在索引中插入msft的缺失日期,并为这些日期索引的值插入NaN或NA.

In the example give, I want to insert the missing dates for msft in the index and insert NaN or NA for the value for those date indices.

推荐答案

如果您想将代码作为列使用,则可以执行unstack,如下所示:

If you want to work with your tickers as columns, could do an unstack, like this:

In [71]: df.unstack(level=0)
Out[71]: 
           value     
stock       amzn msft
date                 
2015-01-01   4.0  1.0
2015-02-01   2.0  1.0
2015-03-01   5.0  1.0
2015-04-01   6.0  1.0
2015-05-01   7.0  1.0
2015-06-01   8.0  NaN
2015-07-01   6.0  NaN
2015-08-01   5.0  NaN
2015-09-01   4.0  NaN
2015-10-01   1.0  NaN
2015-11-01   2.0  NaN
2015-12-01   4.0  NaN

要重新索引为相同形状,下面的from_product会创建一个新的MultiIndex,其中包含日期/股票的所有组合.

To reindex into the same shape, the from_product below creates a new MultiIndex with all the combinations of dates / tickers.

In [75]: df.reindex(pd.MultiIndex.from_product(df.index.levels))
Out[75]: 
                 value
amzn 2015-01-01    4.0
     2015-02-01    2.0
     2015-03-01    5.0
     2015-04-01    6.0
     2015-05-01    7.0
     2015-06-01    8.0
     2015-07-01    6.0
     2015-08-01    5.0
     2015-09-01    4.0
     2015-10-01    1.0
     2015-11-01    2.0
     2015-12-01    4.0
msft 2015-01-01    1.0
     2015-02-01    1.0
     2015-03-01    1.0
     2015-04-01    1.0
     2015-05-01    1.0
     2015-06-01    NaN
     2015-07-01    NaN
     2015-08-01    NaN
     2015-09-01    NaN
     2015-10-01    NaN
     2015-11-01    NaN
     2015-12-01    NaN

这篇关于大 pandas 重新索引缺少日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆