在 pandas 中按组填写缺少的日期 [英] Fill missing dates by group in pandas

查看:90
本文介绍了在 pandas 中按组填写缺少的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要按组填写缺少的日期.这是创建数据框的代码.我只想将填充列的日期向下添加到填充列的日期更改时,直到组名称"更改为止.

I need to fill the missing date down by group. Here is the code to create the data frame. i want to add the date of the fill column down only as far as the when the date of the fill column changes and only until the group 'name' changes.

    data = {'tdate': [20080815,20080915,20081226,20090110,20090131,20080807,20080831,
    20080918,20081023,20081114,20081207,20090117,20090203,20090219,20090305,20090318,20090501],
        'name': ['A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
    'fill': [NaN,NaN,20080915,NaN,NaN,NaN,NaN,NaN,NaN,20081023,
             NaN,NaN,NaN,NaN,20090219,NaN,NaN]}

    df = pd.DataFrame(data, columns=['tdate', 'name', 'fill'])
    df

当前数据帧

tdate   name    fill
0    20080815    A   NaN
1    20080915    A   NaN
2    20081226    A   20080915
3    20090110    A   NaN
4    20090131    A   NaN
5    20080807    B   NaN
6    20080831    B   NaN
7    20080918    B   NaN
8    20081023    B   NaN
9    20081114    B   20081023
10   20081207    B   NaN
11   20090117    B   NaN
12   20090203    B   NaN
13   20090219    B   NaN
14   20090305    B   20090219
15   20090318    B   NaN
16   20090501    B   NaN

所需的输出

    tdate   name    fill
0    20080815    A   NaN
1    20080915    A   NaN
2    20081226    A   20080915
3    20090110    A   20080915
4    20090131    A   20080915
5    20080807    B   NaN
6    20080831    B   NaN
7    20080918    B   NaN
8    20081023    B   NaN
9    20081114    B   NaN
10   20081207    B   20081023
11   20090117    B   20081023
12   20090203    B   20081023
13   20090219    B   20081023
14   20090305    B   20081023
15   20090318    B   20090219
16   20090501    B   20090219

这是我的代码

df.groupby(df["name"])["fill"].fill()

推荐答案

您非常接近,您只需前进 -fill,而不是仅仅填充:

You were pretty close, you just need to forward-fill rather than just filling:

df.groupby('name')["fill"].ffill()
Out[42]: 
0          NaN
1          NaN
2     20080915
3     20080915
4     20080915
5          NaN
6          NaN
7          NaN
8          NaN
9     20081023
10    20081023
11    20081023
12    20081023
13    20081023
14    20090219
15    20090219
16    20090219
dtype: float64

或等效地:

df.groupby('name')["fill"].fillna(method='ffill')

这篇关于在 pandas 中按组填写缺少的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆