如何通过复制上一行动态创建新行 [英] How to create a new row on the fly by copying previous row
问题描述
我有一个如下所示的数据框
I have a dataframe like as given below
编辑的数据框
df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2],
'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-05 12:59:00','2173-05-04 13:14:00','2173-05-05 13:37:00','2173-07-06 13:39:00','2173-07-08 11:30:00','2173-04-08 16:00:00','2173-04-09 22:00:00','2173-04-11 04:00:00','2173- 04-13 04:30:00','2173-04-14 08:00:00'],
'val' :[5,5,5,5,1,6,5,5,8,3,4,6]})
df['time_1'] = pd.to_datetime(df_yes['time_1'])
df['day'] = df['time_1'].dt.day
我想做的是创建新记录
如下面的屏幕截图所示,您可以看到对于subject_id = 1
,他在4th
天的记录丢失了.所以我想做的是`复制紧邻的前一行
As shown in the below screenshot, you can see that for subject_id = 1
, his record for 4th
day is missing. So what I am trying to do is `copy the immediate preceding row
我在下面尝试过,但没有帮助
I tried below but didn't help
df.groupby('subject_id)['day'].eq(df['day'].shift(-1)).add(1)
新记录应具有与上一行相同的内容,但只应修改日期值(d+1
),如下所示
The new record should have the same content as the previous row but just the date value should be modified (d+1
) like as shown below
我希望每个subject_id
的输出都如下所示.您可以看到new record for day 4 is added
的方式.请注意,新行的时间部分并不重要.它可以是任何东西(00:00:00
).
I expect my output to be like as shown below for each subject_id
. You can see how new record for day 4 is added
. please note that time component of a new row doesn't really matter. it can be anything (00:00:00
).
我只希望在一个月的范围之间添加缺少的日期.例如,主题= 1,在第4个月中有从第3到第5的记录.但是第四位失踪了.因此,我们仅添加第4天的记录.我们不需要第六,第七等
编辑后的输出
推荐答案
删除时间后重复的date
,因此您可以使用每个subject_id
的所有日期创建助手DataFrame:
There are duplicated date
s after remove times, so you can create helper DataFrame with all dates per subject_id
:
df1 = (df.set_index('date')
.groupby('subject_id')
.resample('d')
.last()
.index
.to_frame(index=False))
print (df1)
subject_id date
0 1 2173-04-03
1 1 2173-04-04
2 1 2173-04-05
3 1 2173-04-06
4 2 2173-04-08
5 2 2173-04-09
6 2 2173-04-10
7 2 2173-04-11
8 2 2173-04-12
9 2 2173-04-13
10 2 2173-04-14
然后使用 DataFrame.merge
左连接并向前填充缺失值:
Then use DataFrame.merge
with left join and forward filling missing values:
df2 = df1.merge(df, how='left').groupby('subject_id', as_index=False).ffill()
最后必须在新添加的日期时间中增加天数,一种可能的解决方案是添加新的time_1
值与date
s之间的差而创建的时间增量:
Last is necessary add days to new added datetimes, one possible solution is add timedeltas created by difference between new time_1
values with date
s:
dates = df2['time_1'].dt.normalize()
df2['time_1'] += np.where(dates == df2['date'], 0, df2['date'] - dates)
df2['day'] = df2['time_1'].dt.day
df2['val'] = df2['val'].astype(int)
print (df2)
date time_1 val day
0 2173-04-03 2173-04-03 12:35:00 5 3
1 2173-04-03 2173-04-03 12:50:00 5 3
2 2173-04-03 2173-04-03 12:59:00 5 3
3 2173-04-04 2173-04-04 13:14:00 5 4
4 2173-04-04 2173-04-04 13:37:00 1 4
5 2173-04-05 2173-04-05 13:37:00 1 5
6 2173-04-06 2173-04-06 13:39:00 6 6
7 2173-04-06 2173-04-06 11:30:00 5 6
8 2173-04-08 2173-04-08 16:00:00 5 8
9 2173-04-09 2173-04-09 22:00:00 8 9
10 2173-04-10 2173-04-10 22:00:00 8 10
11 2173-04-11 2173-04-11 04:00:00 3 11
12 2173-04-12 2173-04-12 04:00:00 3 12
13 2173-04-13 2173-04-13 04:30:00 4 13
14 2173-04-14 2173-04-14 08:00:00 6 14
这篇关于如何通过复制上一行动态创建新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!