KMeans 是否在 sklearn 中自动标准化特征 [英] Does KMeans normalize features automatically in sklearn

查看:77
本文介绍了KMeans 是否在 sklearn 中自动标准化特征的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道 KMeans 在进行聚类之前是否会自动对特征进行标准化.似乎没有选项可以提供要求标准化的输入.

I was wondering if KMeans automatically normalizes the features before doing clustering. There seems to be no option to provide an input to ask for normalization.

推荐答案

区分数据预处理(归一化、分箱、加权等)和机器学习算法应用.使用 sklearn.preprocessing用于数据预处理.此外,数据可以通过不同的预处理器进行链式预处理.

One differentiates data preprocessing (normalization, binning, weighting etc) and machine learning algorithms application. Use sklearn.preprocessing for data preprocessing. Moreover, data can be preprocessed in chain by different preprocessors.

对于 K 均值,通常仅对均值进行归一化是不够的.由于 K-means 对数据的方差敏感,因此将数据沿不同特征归一化均衡方差,而方差较大的特征更强调结果.所以对于 K-means,我建议使用 StandardScaler 用于数据预处理.

As for K-means, often it is not sufficient to normalize only mean. One normalizes data equalizing variance along different features as K-means is sensitive to variance in data, and features with larger variance have more emphasis on result. So for K-means, I would recommend using StandardScaler for data preprocessing.

也不要忘记 k-means 结果对观察的顺序很敏感,值得多次运行算法,在两者之间混洗数据,对结果聚类进行平均并以这些平均聚类中心为起点运行最终评估点.

Don't forget also that k-means results are sensitive to the order of observations, and it is worth to run algorithm several times, shuffling data in between, averaging resulting clusters and running final evaluations with those averaged clusters centers as starting points.

这篇关于KMeans 是否在 sklearn 中自动标准化特征的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆