pandas -在保留列/索引值的同时向DataFrame添加缺少的日期? [英] Pandas- adding missing dates to DataFrame while keeping column/index values?

查看:60
本文介绍了 pandas -在保留列/索引值的同时向DataFrame添加缺少的日期?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个pandas数据框,其中包含日期,客户,物品以及购买的美元价值.

I have a pandas dataframe that incorporates dates, customers, items, and then dollar value for purchases.

   date     customer   product   amt  
 1/1/2017   tim        apple       3  
 1/1/2017   jim        melon       2  
 1/1/2017   tom        apple       5  
 1/1/2017   tom        melon       4  
 1/4/2017   tim        melon       3  
 1/4/2017   jim        apple       2  
 1/4/2017   tom        melon       1  
 1/4/2017   tom        orange      4  

我只想查看效果,但是我想从最小和最大日期范围中向前填充所有日期,并为每个产品的每个客户填充

I'm trying to just look at performance, but I want to forward fill all dates from my min and max date range, and also fill for each customer for each product

类似:

   date     customer   product   amt  
 1/1/2017   tim        apple       3  
 1/1/2017   tim        melon       0  
 1/1/2017   tim        orange      0  
 1/1/2017   jim        melon       2  
 1/1/2017   jim        apple       0  
 1/1/2017   jim        orange      0  
 1/1/2017   tom        apple       5  
 1/1/2017   tom        melon       4  
 1/1/2017   tom        orange      0  
 1/2/2017   tim        apple       0  
 1/2/2017   tim        melon       0  
 1/2/2017   tim        orange      0  
 1/2/2017   jim        melon       0  
 1/2/2017   jim        apple       0  
 1/2/2017   jim        orange      0  
 1/2/2017   tom        apple       0  
 1/2/2017   tom        melon       0  
 1/2/2017   tom        orange      0  
 1/3/2017   tim        apple       0  
 1/3/2017   tim        melon       0  
 1/3/2017   tim        orange      0  
 1/3/2017   jim        melon       0  
 1/3/2017   jim        apple       0  
 1/3/2017   jim        orange      0  
 1/3/2017   tom        apple       0  
 1/3/2017   tom        melon       0  
 1/3/2017   tom        orange      0  
 1/4/2017   tim        melon       3  
 1/4/2017   tim        apple       0  
 1/4/2017   tim        orange      0  
 1/4/2017   jim        apple       2  
 1/4/2017   jim        melon       0  
 1/4/2017   jim        orange      0  
 1/4/2017   tom        melon       1  
 1/4/2017   tom        orange      4  
 1/4/2017   tom        apple       0  

我知道我可以根据最大日期和最小日期创建一个重新索引,但这也使我的客户和产品值均为0.还有其他方法可以解决此问题吗?我错过了一步吗?感谢帮助

I know that I can create a reindex based off of the max and min dates, but this also makes my customer and product values 0. Is there any other way to go about this? Am I missing a step or something? Appreciate the help

推荐答案

注意,这使用stackunstack几次

df.set_index(['date','customer','product']).amt.unstack(-3).\
  reindex(columns=pd.date_range(df['date'].min(), 
    df['date'].max()),fill_value=0).\
      stack(dropna=False).unstack().stack(dropna=False).\
        unstack('customer').stack(dropna=False).reset_index().\
          fillna(0).sort_values(['level_1','customer','product'])
Out[314]: 
   product    level_1 customer    0
0    apple 2017-01-01      jim  0.0
12   melon 2017-01-01      jim  2.0
24  orange 2017-01-01      jim  0.0
1    apple 2017-01-01      tim  3.0
13   melon 2017-01-01      tim  0.0
25  orange 2017-01-01      tim  0.0
2    apple 2017-01-01      tom  5.0
14   melon 2017-01-01      tom  4.0
26  orange 2017-01-01      tom  0.0
3    apple 2017-01-02      jim  0.0
15   melon 2017-01-02      jim  0.0
27  orange 2017-01-02      jim  0.0
4    apple 2017-01-02      tim  0.0
16   melon 2017-01-02      tim  0.0
28  orange 2017-01-02      tim  0.0
5    apple 2017-01-02      tom  0.0
17   melon 2017-01-02      tom  0.0
29  orange 2017-01-02      tom  0.0
6    apple 2017-01-03      jim  0.0
18   melon 2017-01-03      jim  0.0
30  orange 2017-01-03      jim  0.0
7    apple 2017-01-03      tim  0.0
19   melon 2017-01-03      tim  0.0
31  orange 2017-01-03      tim  0.0
8    apple 2017-01-03      tom  0.0
20   melon 2017-01-03      tom  0.0
32  orange 2017-01-03      tom  0.0
9    apple 2017-01-04      jim  2.0
21   melon 2017-01-04      jim  0.0
33  orange 2017-01-04      jim  0.0
10   apple 2017-01-04      tim  0.0
22   melon 2017-01-04      tim  3.0
34  orange 2017-01-04      tim  0.0
11   apple 2017-01-04      tom  0.0
23   melon 2017-01-04      tom  1.0
35  orange 2017-01-04      tom  4.0

这篇关于 pandas -在保留列/索引值的同时向DataFrame添加缺少的日期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆