了解KMeans算法的质量 [英] Understanding the quality of the KMeans algorithm

查看:75
本文介绍了了解KMeans算法的质量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读 KMeans的不平衡因数之后,我试图了解其工作原理.我的意思是,从我的示例中,我可以看到该因子的值越小,KMeans聚类的质量就越好,即其聚类越平衡.但是对此因素的赤裸裸的数学解释是什么?这是已知数量还是什么?

After reading Unbalanced factor of KMeans, I am trying to understand how this works. I mean, from my examples, I can see that the less the value of the factor, the better the quality of KMeans clustering, i.e. the more balanced are its clusters. But what is the naked mathematical interpretation of this factor? Is this a known quantity or something?

这是我的例子:

C1 = 10
C2 = 100

pdd = [(C1,10), (C2, 100)]
n = 2        <-- #clusters
total = 110  <-- #points
uf = 10 * 10 + 100 * 100
uf = 100100 * 2 / 12100 = 16.5


C1 = 50
C2 = 60

pdd = [(C1, 50), (C2, 60)]
n = 2        
total = 110  
uf = 2500 + 3600
uf = 6100 * 2 / 12100 = 1.008


C1 = 1
C2 = 1

pdd = [(C1, 1), (C2, 1)]
n = 2       
total = 2
uf = 2
uf = 2 * 2 / 2 * 2 = 1

推荐答案

它似乎与基尼系数有关,基尼系数是一种熵的度量,它也使用平方数的总和.

It appears to be related to Gini index, a measure of entropy, which also uses the sum of squared counts.

,如交叉验证:了解KMeans算法的质量.

这篇关于了解KMeans算法的质量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆