Pandas:填充缺失日期的数据 [英] Pandas: Filling data for missing dates

查看:103
本文介绍了Pandas:填充缺失日期的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有下表:

ProdID  Date        Val1 Val2 Val3
Prod1   4/1/2019    1    3    4
Prod1   4/3/2019    2    3    54
Prod1   4/4/2019    3    4    54
Prod2   4/1/2019    1    3    3
Prod2   4/2/2019    1    3    4
Prod2   4/3/2019    2    4    4
Prod2   4/4/2019    2    5    3

Prod2 条目已正确填充,因为我们获得了从 4/1/20194/4/2019 的数据.

Prod2 entries are populated correctly as we've got the data from 4/1/2019 to 4/4/2019.

Prod1 缺少 1 个日期 - 4/2/2019.

Prod1 has 1 missing date - 4/2/2019.

我想找到所有 ProdID 的缺失日期,并用从上一个条目的最后一个复制的数据填充 Val1-3.例如,我想将数据从 4/1/2019 复制到 4/2/2019

I would like to find missing dates for all ProdIDs and fill in Val1-3 with data copied from the last of previous entry. For instance, I would like to copy data from 4/1/2019 to 4/2/2019

ProdID  Date        Val1 Val2 Val3
Prod1   4/1/2019    1    3    4
Prod1   4/2/2019    1    3    4
Prod1   4/3/2019    2    3    54
Prod1   4/4/2019    3    4    54
Prod2   4/1/2019    1    3    3
Prod2   4/2/2019    1    3    4
Prod2   4/3/2019    2    4    4
Prod2   4/4/2019    2    5    3

推荐答案

First convert column to datetimes by to_datetime,然后通过 DataFrame.set_index 并调用 GroupBy.applyDataFrame.asfreq - 也可以指定向前或向后填充缺失值的方法:

First convert column to datetimes by to_datetime, then create DatetimeIndex by DataFrame.set_index and call GroupBy.apply with DataFrame.asfreq - there is also possible specify method for forward or back filling missing values:

df['Date'] = pd.to_datetime(df['Date'])

df1 = (df.set_index('Date')
         .groupby('ProdID')
         .apply(lambda x: x.asfreq('D', method='ffill'))
         .reset_index(level=0, drop=True)
         .reset_index()
         .reindex(df.columns, axis=1))

print (df1)
  ProdID       Date  Val1  Val2  Val3
0  Prod1 2019-04-01     1     3     4
1  Prod1 2019-04-02     1     3     4
2  Prod1 2019-04-03     2     3    54
3  Prod1 2019-04-04     3     4    54
4  Prod2 2019-04-01     1     3     3
5  Prod2 2019-04-02     1     3     4
6  Prod2 2019-04-03     2     4     4
7  Prod2 2019-04-04     2     5     3

另一种解决方案是通过 datetimes 的所有组合"nofollow noreferrer">productDataFrame.merge 使用左连接,最后通过 ffill 向前填充缺失值:

Another solution is create all combinations of product and datetimes by product and DataFrame.merge with left join, last forward filling missing values by ffill:

dates = pd.date_range(start=df['Date'].min(), end=df['Date'].max())
prods = df.ProdID.unique()

from  itertools import product
df1 = pd.DataFrame(list(product(prods, dates)), columns=['ProdID', 'Date'])
print (df1)
  ProdID       Date
0  Prod1 2019-04-01
1  Prod1 2019-04-02
2  Prod1 2019-04-03
3  Prod1 2019-04-04
4  Prod2 2019-04-01
5  Prod2 2019-04-02
6  Prod2 2019-04-03
7  Prod2 2019-04-04

df = df1.merge(df, how='left').ffill()
print (df)
  ProdID       Date  Val1  Val2  Val3
0  Prod1 2019-04-01   1.0   3.0   4.0
1  Prod1 2019-04-02   1.0   3.0   4.0
2  Prod1 2019-04-03   2.0   3.0  54.0
3  Prod1 2019-04-04   3.0   4.0  54.0
4  Prod2 2019-04-01   1.0   3.0   3.0
5  Prod2 2019-04-02   1.0   3.0   4.0
6  Prod2 2019-04-03   2.0   4.0   4.0
7  Prod2 2019-04-04   2.0   5.0   3.0

这篇关于Pandas:填充缺失日期的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆