轮廓分数如何为负? [英] How can silhouette scores be negative?

查看:73
本文介绍了轮廓分数如何为负?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我们有一些数据点:

If we have some datapoints:

例如,我们使用k均值进行细分;产生的段是否不是每个点都最接近其各自簇的质心?如果是这样,那么当轮廓分数比较ai(到集群内点的平均距离)与bi(到集群外点的平均距离)进行比较时,怎么可能会出现分数为负或bi小于ai的情况呢??

And we use, for example, k-means to segment; are the resulting segments not such that every point is closest to the center-of-mass of its respective cluster? And if so, when silhouette score compares ai (average distance to intra-cluster points) vs bi (average distance to extra-cluster points), how can it ever be the case that the score is negative, or that bi is less than ai?

我可以看到,对于不同的分类算法,某些更复杂的分类算法可能会有所不同,或者某些点分配不正确.但是,这对于k均值是如何发生的?

I can see maybe for different classification algorithms, some more sophisticated ones may cluster differently, or some points are assigned incorrectly. But how does this happen for k-means?

推荐答案

我到簇中点的平均距离与其到质心的距离相同簇.轮廓分数在计算a(i)和b(i)时使用前者,而k-means在聚类分配中使用后者,因此可能存在分歧.

A point i's average distance to points in a cluster is not the same as its distance to the center-of-mass of that cluster. Silhouette score uses the former when calculating a(i) and b(i), while k-means uses the latter in cluster assignment, so there may be disagreement.

例如,在下图中:假设蓝色点已分配给一个群集,绿色点已分配给另一群集.红点将分配到哪个群集?蓝色群集的质量中心位于(0,1),绿色群集的质量中心位于(0,-1.15),因此,红点将分配给蓝色群集.但是,它到绿点的平均距离为1.15,而到蓝点的平均距离为1.414,因此轮廓得分为负.

For example, in the image below: suppose the blue points are already assigned to one cluster and the green points to another. To which cluster will the red point be assigned? The center-of-mass of the blue cluster is at (0, 1) and the center-of-mass of the green cluster is at (0, -1.15), so the red point will be assigned to the blue cluster. However, its average distance to the green points is 1.15 while its average distance to the blue points is 1.414, so it will get a negative silhouette score.

这篇关于轮廓分数如何为负?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆