pandas -使用多个值填充NaN [英] Pandas - Fill NaN using multiple values

查看:129
本文介绍了 pandas -使用多个值填充NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一列(称为X列),包含大约16000个NaN值.该列有两个可能的值,即1或0(如二进制)

我想在X列中填写NaN值,但我不想对所有NaN条目使用单个值.

例如说:我想用"1"填充NaN值的50%,用"0"填充其他50%的NaN值.

我已经阅读了'fillna()'文档,但没有找到任何可以满足此功能的相关信息.

我真的不知道如何解决这个问题,所以我什么也没尝试.

df['Column_x'] = df['Column_x'].fillna(df['Column_x'].mode()[0], inplace= True)

但是这将用列的模式填充我的数据框'df'的X列中的所有NaN值,我想用一个值填充50%,用另一个值填充其他50%.

由于我尚未尝试任何操作,因此无法显示或描述任何实际结果.

我可以说的是,预期结果将是x列的8000 NaN值(用"1"替换,另外8000个"0")的行.

视觉效果类似于;

处理NaN之前

Index     Column_x
0          0.0
1          0.0
2          0.0
3          0.0
4          0.0
5          0.0
6          1.0
7          1.0
8          1.0
9          1.0
10         1.0
11         1.0
12         NaN
13         NaN
14         NaN
15         NaN
16         NaN
17         NaN
18         NaN
19         NaN

处理完NaN后

Index     Column_x
0          0.0
1          0.0
2          0.0
3          0.0
4          0.0
5          0.0
6          1.0
7          1.0
8          1.0
9          1.0
10         1.0
11         1.0
12         0.0
13         0.0
14         0.0
15         0.0
16         1.0
17         1.0
18         1.0
19         1.0

解决方案

使用pandas.Series.sample:

mask = df['Column_x'].isna() 
ind = df['Column_x'].loc[mask].sample(frac=0.5).index
df.loc[ind, 'Column_x'] = 1
df['Column_x'] = df['Column_x'].fillna(0)
print(df)

输出:

    Index  Column_x
0       0       0.0
1       1       0.0
2       2       0.0
3       3       0.0
4       4       0.0
5       5       0.0
6       6       1.0
7       7       1.0
8       8       1.0
9       9       1.0
10     10       1.0
11     11       1.0
12     12       1.0
13     13       0.0
14     14       1.0
15     15       0.0
16     16       0.0
17     17       1.0
18     18       1.0
19     19       0.0

I have a column ( lets call it Column X) containing around 16000 NaN values. The column has two possible values, 1 or 0 ( so like a binary )

I want to fill the NaN values in column X, but i don't want to use a single value for ALL the NaN entries.

say for instance that; i want to fill 50% of the NaN values with '1' and the other 50% with '0'.

I have read the ' fillna() ' documentation but i have not found any such relevant information which could satisfy this functionality.

I have literally no idea on how to move forward regarding this problem, so i haven't tried anything.

df['Column_x'] = df['Column_x'].fillna(df['Column_x'].mode()[0], inplace= True)

but this would fill ALL the NaN values in Column X of my dataframe 'df' with the mode of the column, i want to fill 50% with one value and other 50% with a different value.

Since i haven't tried anything yet, i can't show or describe any actual results.

what i can tell is that the expected result would be something along the lines of 8000 NaN values of column x replaced with '1' and another 8000 with '0' .

A visual result would be something like;

Before Handling NaN

Index     Column_x
0          0.0
1          0.0
2          0.0
3          0.0
4          0.0
5          0.0
6          1.0
7          1.0
8          1.0
9          1.0
10         1.0
11         1.0
12         NaN
13         NaN
14         NaN
15         NaN
16         NaN
17         NaN
18         NaN
19         NaN

After Handling NaN

Index     Column_x
0          0.0
1          0.0
2          0.0
3          0.0
4          0.0
5          0.0
6          1.0
7          1.0
8          1.0
9          1.0
10         1.0
11         1.0
12         0.0
13         0.0
14         0.0
15         0.0
16         1.0
17         1.0
18         1.0
19         1.0

解决方案

Using pandas.Series.sample:

mask = df['Column_x'].isna() 
ind = df['Column_x'].loc[mask].sample(frac=0.5).index
df.loc[ind, 'Column_x'] = 1
df['Column_x'] = df['Column_x'].fillna(0)
print(df)

Output:

    Index  Column_x
0       0       0.0
1       1       0.0
2       2       0.0
3       3       0.0
4       4       0.0
5       5       0.0
6       6       1.0
7       7       1.0
8       8       1.0
9       9       1.0
10     10       1.0
11     11       1.0
12     12       1.0
13     13       0.0
14     14       1.0
15     15       0.0
16     16       0.0
17     17       1.0
18     18       1.0
19     19       0.0

这篇关于 pandas -使用多个值填充NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆