将平均值替换为缺失值(Weka) [英] Replace missing values with mean (Weka)

查看:830
本文介绍了将平均值替换为缺失值(Weka)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Weka中,有一个名为"ReplaceMissingValues"的过滤器,该过滤器允许使用每个属性的平均值替换数据集中的所有缺失值.我想使用属于某个类的值的平均值替换某个属性的缺失值.例如,在二进制数据集中,我认为使用仅对属于正类的记录进行计算的平均值来替换属于正类的记录中的属性的缺失值是更正确的.那么如何实现呢?我们如何只为属于某个类的记录替换值?

in Weka there is a filter called "ReplaceMissingValues" that permit to replace all missing values in a dataset using the mean of each attribute. I'd like to replace missing values, for a certain attribute, using the mean of values that belong to a certain class. For example in a binary dataset I think that is more correct to replace a missing value for an attribute in record that belong to the positive class using the mean calculated with only the records that belong to the positive class. So how is possible to realized it? How can we replace values only for record that belong to a certain class?

推荐答案

如果您想通过从该特定A类的训练实例中计算出的平均值来替换A类的缺失值,那么您就是在偏向"您的数据集.为了避免出现偏差(最终将使模型过度适合您的训练模型),明智的做法是使用默认的替换缺失值"功能-即考虑所有训练实例的均值和众数,而不是仅考虑该特定类.

If you want to replace missing values of Class A by taking the mean calculated from the training instances of that particular class A, then you are "bias"ing your dataset. To avoid bias (which eventually will overfit your trained model), it is wise to use the default "replace missing values" function- i.e., to consider mean and mode of all training instances rather than of just that particular class.

这篇关于将平均值替换为缺失值(Weka)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆