返回kmeans聚类中最远的离群值? [英] Return the furthermost outlier in kmeans clustering?
问题描述
在sklearn kmeans聚类之后,是否有任何简单的方法可以返回最远的离群值?
Is there any easy way to return the furthermost outlier after sklearn kmeans clustering?
本质上,我想列出负载最大的离群值列表。不幸的是,由于分配,我需要使用sklearn.cluster.KMeans。
Essentially I want to make a list of the biggest outliers for a load of clusters. Unfortunately I need to use sklearn.cluster.KMeans due to the assignment.
推荐答案
K-means不适用于离群值检测。
K-means is not well suited for "outlier" detection.
k均值有将离群值变成一个单元素簇的趋势。然后离群值具有最小可能的距离,并且不会被检测到。
k-means has a tendency to make outliers a one-element cluster. Then the outliers have the smallest possible distance and will not be detected.
当数据中存在离群值时,K均值不够鲁棒。您实际上可能想在使用k均值之前删除异常值。
K-means is not robust enough when there are outliers in your data. You may actually want to remove outliers prior to using k-means.
请改用kNN,LOF或LoOP之类的东西。
Use rather something like kNN, LOF or LoOP instead.
这篇关于返回kmeans聚类中最远的离群值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!