pandas :添加缺少月份的数据 [英] Pandas: Add data for missing months
问题描述
我有一个按月显示的按客户划分的销售信息数据框,看起来像这样,有多个客户,不同的月经和花费:
I have a dataframe of sales information by customers by month period, that looks something like this, with multiple customers and varying month periods and spend:
customer_id month_year sales
0 12 2012-05 2.58
1 12 2011-07 33.14
2 12 2011-11 182.06
3 12 2012-03 155.32
4 12 2012-01 71.24
如您所见,对于每个客户来说,许多个月都失踪了.我想为month_year范围内的所有月份的每位客户添加额外的行,其中sales = 0.0.
As you can see, for each customer many of the months are missing. I would like to add additional rows for each customer, with sales = 0.0, for all of the months in the range of month_year.
任何人都可以建议最好的方法吗?
Can anyone advise the best way to do this?
推荐答案
类似的东西;请注意,未定义customer_id的填充(因为您可能在groupby之类的东西中有此填充).
Something like this; note that the filling the customer_id is not defined (as you probably have this in a groupby or something).
如果需要,您可能需要在最后添加reset_index
You may need a reset_index
at the end (if desired)
In [130]: df2 = df.set_index('month_year')
In [131]: df2 = df2.sort_index()
In [132]: df2
Out[132]:
customer_id sales
month_year
2011-07 12 33.14
2011-11 12 182.06
2012-01 12 71.24
2012-03 12 155.32
2012-05 12 2.58
In [133]: df2.reindex(pd.period_range(df2.index[0],df2.index[-1],freq='M'))
Out[133]:
customer_id sales
2011-07 12 33.14
2011-08 NaN NaN
2011-09 NaN NaN
2011-10 NaN NaN
2011-11 12 182.06
2011-12 NaN NaN
2012-01 12 71.24
2012-02 NaN NaN
2012-03 12 155.32
2012-04 NaN NaN
2012-05 12 2.58
In [135]: df2['customer_id'] = 12
In [136]: df2.fillna(0.0)
Out[136]:
customer_id sales
2011-07 12 33.14
2011-08 12 0.00
2011-09 12 0.00
2011-10 12 0.00
2011-11 12 182.06
2011-12 12 0.00
2012-01 12 71.24
2012-02 12 0.00
2012-03 12 155.32
2012-04 12 0.00
2012-05 12 2.58
这篇关于 pandas :添加缺少月份的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!