使用fillna()在其他列中满足某些条件时如何在列中插入值 [英] How to impute values in a column when certain conditions are fulfilled in other columns using fillna()
问题描述
我计算了credit_history有NaN值时的计数。
I've calculated the counts when credit_history has NaN values.
当Credit_History为NaN时的输出:
Output when Credit_History is NaN:
Self_Employed
Yes 532
No 32
Married
No 398
Yes 21
对于数值,我计算了所有列的均值
And for the numerical values, I calculated the mean for all columns
当Credit_History为NaN时输出非数字值:
output for non-numerical values when Credit_History is NaN:
Mean Applicant Income: 54003.1232
LoanAmount: 35435.12
Loan_Amount_Term: 360
ApplicantIncome: 30000
我现在如何使用fillna()这些情况:
How do I now use fillna() in these cases:
案例1:当Self_Employed = Y且已婚= N时; Credit_History应为0
Case 1: When Self_Employed = Y and Married = N; Credit_History should be 0
案例2:当Self_Employed = N且ApplicantIncome> 20000时; Credit_History应为1
Case 2: When Self_Employed = N and ApplicantIncome > 20000; Credit_History should be 1
案例3:当Self_Employed = Y时,Married = N且ApplicantIncome> 2000; Credit_History应为1
Case 3: When Self_Employed = Y, Married = N and ApplicantIncome > 2000; Credit_History should be 1
此外,当使用fillna()对于某些条件不那么明显时,我们是否可以使用数据透视表来计算中值,然后使用它们来计算它们fillna()?
Also, when using fillna() is not so obvious for certain conditions, can we use a pivot table to calculate the median values and then impute them using fillna()?
提前致谢。
推荐答案
使用< a href =https://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html\"rel =nofollow noreferrer> numpy.select
如果所有条件都是 False
,则输出由参数 default
定义:
Use numpy.select
and if all condition are False
, output is define by parameter default
:
from itertools import product
c = ['Self_Employed','Married','ApplicantIncome']
df = pd.DataFrame(list(product(list('NY'), list('NY'), [10000, 30000])),
columns=c)
m1 = (df.Self_Employed == 'Y') & (df.Married == 'N')
m2 = (df.Self_Employed == 'N') & (df.ApplicantIncome > 20000)
m3 = m1 & (df.ApplicantIncome > 20000)
df['Credit_History'] = np.select([m1, m2, m3], [0,1,1], default=2)
print (df)
Self_Employed Married ApplicantIncome Credit_History
0 N N 10000 2
1 N N 30000 1
2 N Y 10000 2
3 N Y 30000 1
4 Y N 10000 0
5 Y N 30000 0
6 Y Y 10000 2
7 Y Y 30000 2
但如果想要通过条件替换,请添加 fillna
:
c = ['Self_Employed','Married','ApplicantIncome']
df = pd.DataFrame(list(product(list('NY'), list('NY'), [10000, 30000])),
columns=c).assign(Credit_History=[np.nan,1,0, np.nan] *2)
print (df)
Self_Employed Married ApplicantIncome Credit_History
0 N N 10000 NaN
1 N N 30000 1.0
2 N Y 10000 0.0
3 N Y 30000 NaN
4 Y N 10000 NaN
5 Y N 30000 1.0
6 Y Y 10000 0.0
7 Y Y 30000 NaN
m1 = (df.Self_Employed == 'Y') & (df.Married == 'N')
m2 = (df.Self_Employed == 'N') & (df.ApplicantIncome > 20000)
m3 = m1 & (df.ApplicantIncome > 20000)
s = pd.Series(np.select([m1, m2, m3], [0,1,1], default=2), index=df.index)
df['Credit_History'] = df['Credit_History'].fillna(s)
print (df)
Self_Employed Married ApplicantIncome Credit_History
0 N N 10000 2.0
1 N N 30000 1.0
2 N Y 10000 0.0
3 N Y 30000 1.0
4 Y N 10000 0.0
5 Y N 30000 1.0
6 Y Y 10000 0.0
7 Y Y 30000 2.0
这篇关于使用fillna()在其他列中满足某些条件时如何在列中插入值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!