pandas -在保留列/索引值的同时向DataFrame添加缺少的日期? [英] Pandas- adding missing dates to DataFrame while keeping column/index values?
本文介绍了 pandas -在保留列/索引值的同时向DataFrame添加缺少的日期?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个pandas数据框,其中包含日期,客户,物品以及购买的美元价值.
I have a pandas dataframe that incorporates dates, customers, items, and then dollar value for purchases.
date customer product amt
1/1/2017 tim apple 3
1/1/2017 jim melon 2
1/1/2017 tom apple 5
1/1/2017 tom melon 4
1/4/2017 tim melon 3
1/4/2017 jim apple 2
1/4/2017 tom melon 1
1/4/2017 tom orange 4
我只想查看效果,但是我想从最小和最大日期范围中向前填充所有日期,并为每个产品的每个客户填充
I'm trying to just look at performance, but I want to forward fill all dates from my min and max date range, and also fill for each customer for each product
类似:
date customer product amt
1/1/2017 tim apple 3
1/1/2017 tim melon 0
1/1/2017 tim orange 0
1/1/2017 jim melon 2
1/1/2017 jim apple 0
1/1/2017 jim orange 0
1/1/2017 tom apple 5
1/1/2017 tom melon 4
1/1/2017 tom orange 0
1/2/2017 tim apple 0
1/2/2017 tim melon 0
1/2/2017 tim orange 0
1/2/2017 jim melon 0
1/2/2017 jim apple 0
1/2/2017 jim orange 0
1/2/2017 tom apple 0
1/2/2017 tom melon 0
1/2/2017 tom orange 0
1/3/2017 tim apple 0
1/3/2017 tim melon 0
1/3/2017 tim orange 0
1/3/2017 jim melon 0
1/3/2017 jim apple 0
1/3/2017 jim orange 0
1/3/2017 tom apple 0
1/3/2017 tom melon 0
1/3/2017 tom orange 0
1/4/2017 tim melon 3
1/4/2017 tim apple 0
1/4/2017 tim orange 0
1/4/2017 jim apple 2
1/4/2017 jim melon 0
1/4/2017 jim orange 0
1/4/2017 tom melon 1
1/4/2017 tom orange 4
1/4/2017 tom apple 0
我知道我可以根据最大日期和最小日期创建一个重新索引,但这也使我的客户和产品值均为0.还有其他方法可以解决此问题吗?我错过了一步吗?感谢帮助
I know that I can create a reindex based off of the max and min dates, but this also makes my customer and product values 0. Is there any other way to go about this? Am I missing a step or something? Appreciate the help
推荐答案
注意,这使用stack
和unstack
几次
df.set_index(['date','customer','product']).amt.unstack(-3).\
reindex(columns=pd.date_range(df['date'].min(),
df['date'].max()),fill_value=0).\
stack(dropna=False).unstack().stack(dropna=False).\
unstack('customer').stack(dropna=False).reset_index().\
fillna(0).sort_values(['level_1','customer','product'])
Out[314]:
product level_1 customer 0
0 apple 2017-01-01 jim 0.0
12 melon 2017-01-01 jim 2.0
24 orange 2017-01-01 jim 0.0
1 apple 2017-01-01 tim 3.0
13 melon 2017-01-01 tim 0.0
25 orange 2017-01-01 tim 0.0
2 apple 2017-01-01 tom 5.0
14 melon 2017-01-01 tom 4.0
26 orange 2017-01-01 tom 0.0
3 apple 2017-01-02 jim 0.0
15 melon 2017-01-02 jim 0.0
27 orange 2017-01-02 jim 0.0
4 apple 2017-01-02 tim 0.0
16 melon 2017-01-02 tim 0.0
28 orange 2017-01-02 tim 0.0
5 apple 2017-01-02 tom 0.0
17 melon 2017-01-02 tom 0.0
29 orange 2017-01-02 tom 0.0
6 apple 2017-01-03 jim 0.0
18 melon 2017-01-03 jim 0.0
30 orange 2017-01-03 jim 0.0
7 apple 2017-01-03 tim 0.0
19 melon 2017-01-03 tim 0.0
31 orange 2017-01-03 tim 0.0
8 apple 2017-01-03 tom 0.0
20 melon 2017-01-03 tom 0.0
32 orange 2017-01-03 tom 0.0
9 apple 2017-01-04 jim 2.0
21 melon 2017-01-04 jim 0.0
33 orange 2017-01-04 jim 0.0
10 apple 2017-01-04 tim 0.0
22 melon 2017-01-04 tim 3.0
34 orange 2017-01-04 tim 0.0
11 apple 2017-01-04 tom 0.0
23 melon 2017-01-04 tom 1.0
35 orange 2017-01-04 tom 4.0
这篇关于 pandas -在保留列/索引值的同时向DataFrame添加缺少的日期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文