k均值空簇 [英] k-means empty cluster

查看:258
本文介绍了k均值空簇的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试将 k均值用作家庭作业.我的练习表对空中心给了我以下评论:

I try to implement k-means as a homework assignment. My exercise sheet gives me following remark regarding empty centers:

在迭代过程中,如果任何聚类中心没有与之关联的数据点,则将其替换为随机数据点.

During the iterations, if any of the cluster centers has no data points associated with it, replace it with a random data point.

这让我有些困惑,首先是Wikipedia或我阅读的其他来源根本没有提及.我进一步了解了为数据选择一个合适的k"的问题-如果我开始为空的群集设置新的中心,算法应该如何收敛?

That confuses me a bit, firstly Wikipedia or other sources I read do not mention that at all. I further read about a problem with 'choosing a good k for your data' - how is my algorithm supposed to converge if I start setting new centers for cluster that were empty.

如果忽略空簇,则在30-40次迭代后会收敛.忽略空簇是错误的吗?

If I ignore empty clusters I converge after 30-40 iterations. Is it wrong to ignore empty clusters?

推荐答案

处理空集群不是k-means算法的一部分,但可能会导致更好的集群质量.说到收敛,它永远不能完全保证,而是只能通过试探法来保证,因此,通过包含最大迭代次数来扩展收敛标准.

Handling empty clusters is not part of the k-means algorithm but might result in better clusters quality. Talking about convergence, it is never exactly but only heuristically guaranteed and hence the criterion for convergence is extended by including a maximum number of iterations.

关于解决此问题的策略,我想说,随机分配一些数据点不是很聪明,因为由于到其当前分配的中心的距离较大或较小,我们可能会影响群集的质量.对于这种情况,一种启发式方法是从最大的簇中选择最远的点,然后移动该空的簇,然后再进行操作,直到没有空的簇为止.

Regarding the strategy to tackle down this problem, I would say randomly assigning some data point to it is not very clever since we might be affecting the clusters quality since the distance to its currently assigned center is large or small. An heuristic for this case would be to choose the farthest point from the biggest cluster and move that the empty cluster, then do so until there are no empty clusters.

这篇关于k均值空簇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆