根据日期和值在pandas数据框中添加行 [英] add rows in pandas dataframe based on date and value
本文介绍了根据日期和值在pandas数据框中添加行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个如下所示的pandas数据框:
I have a pandas dataframe like the following:
id, date, add_days
1, 2017-01-01, 3
2, 2017-03-05, 5
3, 2017-02-27, 3
.
.
.
我要重复ID并按给定的add_days增加日期:
I want to repeat the ids and increase date by given add_days:
id, date, add_days
1, 2017-01-01, 3
1, 2017-01-02, 3
1, 2017-01-03, 3
2, 2017-03-05, 5
2, 2017-03-06, 5
2, 2017-03-07, 5
2, 2017-03-08, 5
2, 2017-03-09, 5
3, 2017-02-27, 3
3, 2017-02-28, 3
3, 2017-03-01, 3
.
.
.
有没有做到这一点的熊猫式方法? 我正在寻找一种有效的解决方案,因为初始数据框可以包含数百万行.
Is there a panda-oic way of doing this? I'm looking for an efficient solution since the initial dataframe can have millions of rows.
推荐答案
您可以使用 melt
与groupby
和 resample
:
You can use melt
with groupby
and resample
:
cols = df.columns
#add end date by timedelta, only substract one day
df['end'] = df.date + pd.to_timedelta(df.add_days.sub(1), unit='d')
print (df)
id date add_days end
0 1 2017-01-01 3 2017-01-03
1 2 2017-03-05 5 2017-03-09
2 3 2017-02-27 3 2017-03-01
df1 = pd.melt(
df, ['id', 'add_days'],
['date', 'end'],
value_name='date'
).drop('variable', 1).set_index('date')
print (df1)
id add_days
date
2017-01-01 1 3
2017-03-05 2 5
2017-02-27 3 3
2017-01-03 1 3
2017-03-09 2 5
2017-03-01 3 3
df2=df1.groupby('id').resample('D').ffill().reset_index(0, drop=True).reset_index()
#if order of columns is important
df2 = df2.reindex_axis(cols, axis=1)
print (df2)
id date add_days
0 1 2017-01-01 3
1 1 2017-01-02 3
2 1 2017-01-03 3
3 2 2017-03-05 5
4 2 2017-03-06 5
5 2 2017-03-07 5
6 2 2017-03-08 5
7 2 2017-03-09 5
8 3 2017-02-27 3
9 3 2017-02-28 3
10 3 2017-03-01 3
使用 concat
Series
由 date_range
创建,最后一个<将href ="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html" rel ="nofollow noreferrer"> join
转换为原始df
:
df1 = pd.concat([pd.Series(r.Index, pd.date_range(r.date, r.end))
for r in df.itertuples()]).reset_index()
df1.columns = ['date','idx']
print (df1)
date idx
0 2017-01-01 0
1 2017-01-02 0
2 2017-01-03 0
3 2017-03-05 1
4 2017-03-06 1
5 2017-03-07 1
6 2017-03-08 1
7 2017-03-09 1
8 2017-02-27 2
9 2017-02-28 2
10 2017-03-01 2
df2 = df1.set_index('idx').join(df[['id','add_days']]).reset_index(drop=True)
print (df2)
date id add_days
0 2017-01-01 1 3
1 2017-01-02 1 3
2 2017-01-03 1 3
3 2017-03-05 2 5
4 2017-03-06 2 5
5 2017-03-07 2 5
6 2017-03-08 2 5
7 2017-03-09 2 5
8 2017-02-27 3 3
9 2017-02-28 3 3
10 2017-03-01 3 3
这篇关于根据日期和值在pandas数据框中添加行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文