如何通过复制上一行动态创建新行 [英] How to create a new row on the fly by copying previous row

查看:91
本文介绍了如何通过复制上一行动态创建新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框

I have a dataframe like as given below

编辑的数据框

df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2],
'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-05 12:59:00','2173-05-04 13:14:00','2173-05-05 13:37:00','2173-07-06 13:39:00','2173-07-08 11:30:00','2173-04-08 16:00:00','2173-04-09 22:00:00','2173-04-11 04:00:00','2173- 04-13 04:30:00','2173-04-14 08:00:00'],
 'val' :[5,5,5,5,1,6,5,5,8,3,4,6]})
df['time_1'] = pd.to_datetime(df_yes['time_1'])
df['day'] = df['time_1'].dt.day

我想做的是创建新记录

如下面的屏幕截图所示,您可以看到对于subject_id = 1,他在4th天的记录丢失了.所以我想做的是`复制紧邻的前一行

As shown in the below screenshot, you can see that for subject_id = 1, his record for 4th day is missing. So what I am trying to do is `copy the immediate preceding row

我在下面尝试过,但没有帮助

I tried below but didn't help

df.groupby('subject_id)['day'].eq(df['day'].shift(-1)).add(1)    

新记录应具有与上一行相同的内容,但只应修改日期值(d+1),如下所示

The new record should have the same content as the previous row but just the date value should be modified (d+1) like as shown below

我希望每个subject_id的输出都如下所示.您可以看到new record for day 4 is added的方式.请注意,新行的时间部分并不重要.它可以是任何东西(00:00:00).

I expect my output to be like as shown below for each subject_id. You can see how new record for day 4 is added. please note that time component of a new row doesn't really matter. it can be anything (00:00:00).

我只希望在一个月的范围之间添加缺少的日期.例如,主题= 1,在第4个月中有从第3到第5的记录.但是第四位失踪了.因此,我们仅添加第4天的记录.我们不需要第六,第七等

编辑后的输出

推荐答案

删除时间后重复的date,因此您可以使用每个subject_id的所有日期创建助手DataFrame:

There are duplicated dates after remove times, so you can create helper DataFrame with all dates per subject_id:

df1 = (df.set_index('date')
         .groupby('subject_id')
         .resample('d')
         .last()
         .index
         .to_frame(index=False))
print (df1)
    subject_id       date
0            1 2173-04-03
1            1 2173-04-04
2            1 2173-04-05
3            1 2173-04-06
4            2 2173-04-08
5            2 2173-04-09
6            2 2173-04-10
7            2 2173-04-11
8            2 2173-04-12
9            2 2173-04-13
10           2 2173-04-14

然后使用 DataFrame.merge 左连接并向前填充缺失值:

Then use DataFrame.merge with left join and forward filling missing values:

df2 = df1.merge(df, how='left').groupby('subject_id', as_index=False).ffill()

最后必须在新添加的日期时间中增加天数,一种可能的解决方案是添加新的time_1值与date s之间的差而创建的时间增量:

Last is necessary add days to new added datetimes, one possible solution is add timedeltas created by difference between new time_1 values with dates:

dates = df2['time_1'].dt.normalize() 
df2['time_1'] += np.where(dates == df2['date'], 0, df2['date'] - dates)
df2['day'] = df2['time_1'].dt.day
df2['val'] = df2['val'].astype(int)
print (df2)

         date              time_1  val  day
0  2173-04-03 2173-04-03 12:35:00    5    3
1  2173-04-03 2173-04-03 12:50:00    5    3
2  2173-04-03 2173-04-03 12:59:00    5    3
3  2173-04-04 2173-04-04 13:14:00    5    4
4  2173-04-04 2173-04-04 13:37:00    1    4
5  2173-04-05 2173-04-05 13:37:00    1    5
6  2173-04-06 2173-04-06 13:39:00    6    6
7  2173-04-06 2173-04-06 11:30:00    5    6
8  2173-04-08 2173-04-08 16:00:00    5    8
9  2173-04-09 2173-04-09 22:00:00    8    9
10 2173-04-10 2173-04-10 22:00:00    8   10
11 2173-04-11 2173-04-11 04:00:00    3   11
12 2173-04-12 2173-04-12 04:00:00    3   12
13 2173-04-13 2173-04-13 04:30:00    4   13
14 2173-04-14 2173-04-14 08:00:00    6   14

这篇关于如何通过复制上一行动态创建新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆