根据日期和值在pandas数据框中添加行 [英] add rows in pandas dataframe based on date and value

查看:77
本文介绍了根据日期和值在pandas数据框中添加行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的pandas数据框:

I have a pandas dataframe like the following:

id, date, add_days
1, 2017-01-01, 3
2, 2017-03-05, 5
3, 2017-02-27, 3
.
.
.

我要重复ID并按给定的add_days增加日期:

I want to repeat the ids and increase date by given add_days:

id, date, add_days
1, 2017-01-01, 3
1, 2017-01-02, 3
1, 2017-01-03, 3
2, 2017-03-05, 5
2, 2017-03-06, 5
2, 2017-03-07, 5
2, 2017-03-08, 5
2, 2017-03-09, 5
3, 2017-02-27, 3
3, 2017-02-28, 3
3, 2017-03-01, 3
.
.
.

有没有做到这一点的熊猫式方法? 我正在寻找一种有效的解决方案,因为初始数据框可以包含数百万行.

Is there a panda-oic way of doing this? I'm looking for an efficient solution since the initial dataframe can have millions of rows.

推荐答案

您可以使用 melt groupby resample :

You can use melt with groupby and resample:

cols = df.columns

#add end date by timedelta, only substract one day
df['end'] = df.date + pd.to_timedelta(df.add_days.sub(1), unit='d')
print (df)
   id       date  add_days        end
0   1 2017-01-01         3 2017-01-03
1   2 2017-03-05         5 2017-03-09
2   3 2017-02-27         3 2017-03-01

df1 = pd.melt(
    df, ['id', 'add_days'],
    ['date', 'end'],
    value_name='date'
).drop('variable', 1).set_index('date')
print (df1)
            id  add_days
date                    
2017-01-01   1         3
2017-03-05   2         5
2017-02-27   3         3
2017-01-03   1         3
2017-03-09   2         5
2017-03-01   3         3

df2=df1.groupby('id').resample('D').ffill().reset_index(0, drop=True).reset_index()
#if order of columns is important
df2 = df2.reindex_axis(cols, axis=1)
print (df2)
    id       date  add_days
0    1 2017-01-01         3
1    1 2017-01-02         3
2    1 2017-01-03         3
3    2 2017-03-05         5
4    2 2017-03-06         5
5    2 2017-03-07         5
6    2 2017-03-08         5
7    2 2017-03-09         5
8    3 2017-02-27         3
9    3 2017-02-28         3
10   3 2017-03-01         3

使用 concat Series date_range 创建,最后一个<将href ="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html" rel ="nofollow noreferrer"> join 转换为原始df:

df1 = pd.concat([pd.Series(r.Index, pd.date_range(r.date, r.end)) 
                                       for r in df.itertuples()]).reset_index()
df1.columns = ['date','idx']
print (df1)
         date  idx
0  2017-01-01    0
1  2017-01-02    0
2  2017-01-03    0
3  2017-03-05    1
4  2017-03-06    1
5  2017-03-07    1
6  2017-03-08    1
7  2017-03-09    1
8  2017-02-27    2
9  2017-02-28    2
10 2017-03-01    2

df2 = df1.set_index('idx').join(df[['id','add_days']]).reset_index(drop=True)
print (df2)
         date  id  add_days
0  2017-01-01   1         3
1  2017-01-02   1         3
2  2017-01-03   1         3
3  2017-03-05   2         5
4  2017-03-06   2         5
5  2017-03-07   2         5
6  2017-03-08   2         5
7  2017-03-09   2         5
8  2017-02-27   3         3
9  2017-02-28   3         3
10 2017-03-01   3         3

这篇关于根据日期和值在pandas数据框中添加行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆