scikit IterativeImputer 中每列的 max_value 和 min_value [英] max_value and min_value for each column in scikit IterativeImputer

查看:60
本文介绍了scikit IterativeImputer 中每列的 max_value 和 min_value的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个包含 78 列和 5707 行的数据集.几乎每一列都有缺失值,我想用 IterativeImputer 来估算它们.如果我理解正确,它将根据其他列的信息对每一列进行更智能"的插补.

I have this data set with 78 columns and 5707 rows. Almost every column has missing values and I would like to impute them with IterativeImputer. If I understood it correctly, it will make a "smarter" imputation on each column based on the information from other columns.

但是,在插补时,我不希望插补值小于观察到的最小值或大于观察到的最大值.我意识到有 max_valuemin_value 参数,但我不想对插补施加全局"限制,相反,我希望每列都有自己的 max_valuemin_value(这是已经观察到的最大值和最小值).因为否则,列中的值没有意义(人数为负值,比率为负值等)

However, when imputing, I do not want the imputed values to be less than the observed minimum or more than the observed maximum. I realize there are max_value and min_value parameters, but I do not want to impose a "global" limit to the imputations, instead, I want each column to have its own max_value and min_value (which is the already observed maximum and minimum values). Because otherwise, the values in the columns do not make sense (negative values for headcounts, negative values for rates, etc.)

有办法实现吗?

推荐答案

因此,如果您想为每列设置不同的最大值和最小值,那么您可以进入循环并在每次迭代中使用 sklearn 选择该列.compose.make_column_selectorsklearn.compose.make_column_transformer 然后将该列的最大值和最小值作为参数应用迭代插补.

So if you want to set max and min different for each column then you can go in a loop and in each iteration select the column using sklearn.compose.make_column_selector or sklearn.compose.make_column_transformer and then apply iterative imputer to that column giving max and min of that column as parameter.

这篇关于scikit IterativeImputer 中每列的 max_value 和 min_value的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆