使用fillna（）在其他列中满足某些条件时如何在列中插入值 [英] How to impute values in a column when certain conditions are fulfilled in other columns using fillna()

查看：567 发布时间：2018/11/15 12:52:04 python pandas ipython data-science

本文介绍了使用fillna（）在其他列中满足某些条件时如何在列中插入值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我计算了credit_history有NaN值时的计数。

I've calculated the counts when credit_history has NaN values.

当Credit_History为NaN时的输出：

Output when Credit_History is NaN:

Self_Employed
Yes  532
No   32

Married
No   398
Yes  21

对于数值，我计算了所有列的均值

And for the numerical values, I calculated the mean for all columns

当Credit_History为NaN时输出非数字值：

output for non-numerical values when Credit_History is NaN:

Mean Applicant Income: 54003.1232
LoanAmount: 35435.12
Loan_Amount_Term: 360
ApplicantIncome: 30000

我现在如何使用fillna（）这些情况：

How do I now use fillna() in these cases:

案例1：当Self_Employed = Y且已婚= N时; Credit_History应为0

Case 1: When Self_Employed = Y and Married = N; Credit_History should be 0

案例2：当Self_Employed = N且ApplicantIncome> 20000时; Credit_History应为1

Case 2: When Self_Employed = N and ApplicantIncome > 20000; Credit_History should be 1

案例3：当Self_Employed = Y时，Married = N且ApplicantIncome> 2000; Credit_History应为1

Case 3: When Self_Employed = Y, Married = N and ApplicantIncome > 2000; Credit_History should be 1

此外，当使用fillna（）对于某些条件不那么明显时，我们是否可以使用数据透视表来计算中值，然后使用它们来计算它们fillna（）？

Also, when using fillna() is not so obvious for certain conditions, can we use a pivot table to calculate the median values and then impute them using fillna()?

提前致谢。

推荐答案

使用< a href =https://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html\"rel =nofollow noreferrer> numpy.select 如果所有条件都是 False ，则输出由参数 default 定义：

Use numpy.select and if all condition are False, output is define by parameter default:

from  itertools import  product
c = ['Self_Employed','Married','ApplicantIncome']
df =  pd.DataFrame(list(product(list('NY'), list('NY'), [10000, 30000])), 
                   columns=c)


m1 = (df.Self_Employed == 'Y') & (df.Married == 'N')
m2 = (df.Self_Employed == 'N') & (df.ApplicantIncome > 20000)
m3 = m1 & (df.ApplicantIncome > 20000)

df['Credit_History'] = np.select([m1, m2, m3], [0,1,1], default=2)
print (df)
  Self_Employed Married  ApplicantIncome  Credit_History
0             N       N            10000               2
1             N       N            30000               1
2             N       Y            10000               2
3             N       Y            30000               1
4             Y       N            10000               0
5             Y       N            30000               0
6             Y       Y            10000               2
7             Y       Y            30000               2

但如果想要通过条件替换，请添加 fillna ：

c = ['Self_Employed','Married','ApplicantIncome']
df =  pd.DataFrame(list(product(list('NY'), list('NY'), [10000, 30000])), 
                   columns=c).assign(Credit_History=[np.nan,1,0, np.nan] *2)
print (df)
  Self_Employed Married  ApplicantIncome  Credit_History
0             N       N            10000             NaN
1             N       N            30000             1.0
2             N       Y            10000             0.0
3             N       Y            30000             NaN
4             Y       N            10000             NaN
5             Y       N            30000             1.0
6             Y       Y            10000             0.0
7             Y       Y            30000             NaN

m1 = (df.Self_Employed == 'Y') & (df.Married == 'N')
m2 = (df.Self_Employed == 'N') & (df.ApplicantIncome > 20000)
m3 = m1 & (df.ApplicantIncome > 20000)

s = pd.Series(np.select([m1, m2, m3], [0,1,1], default=2), index=df.index)
df['Credit_History'] = df['Credit_History'].fillna(s)
print (df)
  Self_Employed Married  ApplicantIncome  Credit_History
0             N       N            10000             2.0
1             N       N            30000             1.0
2             N       Y            10000             0.0
3             N       Y            30000             1.0
4             Y       N            10000             0.0
5             Y       N            30000             1.0
6             Y       Y            10000             0.0
7             Y       Y            30000             2.0

这篇关于使用fillna（）在其他列中满足某些条件时如何在列中插入值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用fillna（）在其他列中满足某些条件时如何在列中插入值 [英] How to impute values in a column when certain conditions are fulfilled in other columns using fillna()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用fillna（）在其他列中满足某些条件时如何在列中插入值 [英] How to impute values in a column when certain conditions are fulfilled in other columns using fillna()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭