pandas 根据上方的行向下填充缺失值 [英] Pandas Filling Missing Values Down Based on Row Above
问题描述
我有一个如下数据框:
import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2, 1], 'col2':[np.nan, 1, np.nan, 1, np.nan, np.nan, np.nan, 2, np.nan]}
df=pd.DataFrame(data,columns=['col1', 'col2'])
print df
col1 col2
0 1 NaN
1 3 1.0
2 3 NaN
3 1 1.0
4 2 NaN
5 3 NaN
6 2 NaN
7 2 2.0
8 1 NaN
如果col2
的值等于1.0
或col2
中的上一行是1.0
,则我试图在第三列中填充col2
中的NaN值.最终的数据帧如下所示:
I am trying to make a third column that fills in the NaN vales in col2
if the value of col2
is equal to 1.0
or the row above in col2
is 1.0
. The final dataframe would look like this:
col1 col2 col3
0 1 NaN NaN
1 3 1.0 1.0
2 3 NaN 1.0
3 1 1.0 1.0
4 2 NaN 1.0
5 3 NaN 1.0
6 2 NaN 1.0
7 2 2.0 2.0
8 1 NaN NaN
我尝试的第一种方法是:
First approach I tried was:
df['col3'] = ((df['col2']== 1) | ((df['col2'].shift()== 1))).astype('int')
这让我有了这个数据框:
This leaves me with this dataframe:
col1 col2 col3
0 1 NaN 0
1 3 1.0 1
2 3 NaN 1
3 1 1.0 1
4 2 NaN 1
5 3 NaN 0
6 2 NaN 0
7 2 2.0 0
8 1 NaN 0
这将更正缺失值的第一个实例,但不会继续填充缺失值.我还尝试使用np.where()
函数,但得到的结果相同.
Which corrects the first instance of a missing value, but does not continue to fill missing values. I also tried using the np.where()
function and I get the same results.
有没有办法在大熊猫中编写它来连续修复多个实例?
Is there a way to write this in pandas where it fixes multiple instances in a row?
推荐答案
您可以使用 np.where
,方法是查看前向填充等于1的位置,在前向填充为True的位置填充1,然后在前向填充为False的情况下退回到'col2'的值:
You can use np.where
by looking at where the forward-fill is equal to one, filling 1 where it's True, and falling back to the value of 'col2' when it's False:
df['col2'] = np.where(df['col2'].ffill() == 1, 1, df['col2'])
结果输出:
col1 col2
0 1 NaN
1 3 1.0
2 3 1.0
3 1 1.0
4 2 1.0
5 3 1.0
6 2 1.0
7 2 2.0
8 1 NaN
这篇关于 pandas 根据上方的行向下填充缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!