Python 中的条件数据插补 [英] Conditional data imputation in Python
本文介绍了Python 中的条件数据插补的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试有条件地估算数据集中的值.
I am trying to impute values in my dataset conditionally.
假设我有三列,如果第 1 列为 1,则第 2 列为 0,第 3 列为 0;如果第 1 列是 2,则第 2 列是 Mean (),第 3 列是 Mean().
Say I have three columns, If Column 1 is 1 then Column 2 is 0 and Column 3 is 0; If column 1 is 2 then Column 2 is Mean () and Column 3 is Mean().
我尝试使用函数 any() 运行 if 语句并单独定义条件.
I tried running an if statement with the function any() and defined the conditions separately.
但是根据条件没有满足条件,我要么得到所有平均值,要么全部为零.
However the conditions are not being fulfilled based on conditions, I am either getting all mean values or all zeroes.
具体代码如下:
if (df['Retention_Term'] == 6):
df.cl_tot_calls_term_seq_1.replace(999, np.nan,inplace = True)
df['cl_tot_calls_term_seq_1'].fillna(df['cl_tot_calls_term_seq_1'].median(),inplace= True)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
推荐答案
试试这个.
mask1 = df['Retention_Term']==6
mask2 = df['cl_tot_calls_term_seq_1'] == 999
df.loc[mask1 & mask2, 'cl_tot_calls_term_seq_1'] = np.nan
那么剩下的应该没问题.
Then the rest should be ok.
df['cl_tot_calls_term_seq_1'].fillna(df['cl_tot_calls_term_seq_1'].median(), inplace= True)
这篇关于Python 中的条件数据插补的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文