不知道簇数的Kmeans? [英] Kmeans without knowing the number of clusters?

查看：145 发布时间：2020/4/26 10:18:20 python machine-learning data-mining k-means

本文介绍了不知道簇数的Kmeans?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在一组高维数据点(大约50个维)上应用k均值，并且想知道是否有任何实现可以找到最佳数量的群集.

我记得在某处读过一种算法通常会这样做的方法，即，使集群间距离最大化而使集群内距离最小，但是我不记得在哪里看到了.如果有人可以指出任何讨论此事的资源，那就太好了.我目前正在将SciPy用于k均值，但任何相关库也都可以.

如果有其他方法可以实现相同或更好的算法，请告诉我.

解决方案

一种方法是交叉验证.

本质上，您选择数据的一个子集并将其聚类为 k 个聚类，并询问与其他数据相比，聚类的效果如何:您是否正在将数据点分配给相同的集群成员身份，还是属于不同的集群?

如果成员资格大致相同，则数据非常适合 k 个群集.否则，您尝试使用其他 k .

此外，您可以进行PCA(主要成分分析)，以将您的50维缩小到一些更易处理的数字.如果PCA运行表明您的大部分方差来自50个维度中的4个，则可以在此基础上选择 k ，以研究如何分配四个集群成员. /p>

I am attempting to apply k-means on a set of high-dimensional data points (about 50 dimensions) and was wondering if there are any implementations that find the optimal number of clusters.

I remember reading somewhere that the way an algorithm generally does this is such that the inter-cluster distance is maximized and intra-cluster distance is minimized but I don't remember where I saw that. It would be great if someone can point me to any resources that discuss this. I am using SciPy for k-means currently but any related library would be fine as well.

If there are alternate ways of achieving the same or a better algorithm, please let me know.

解决方案

One approach is cross-validation.

In essence, you pick a subset of your data and cluster it into k clusters, and you ask how well it clusters, compared with the rest of the data: Are you assigning data points to the same cluster memberships, or are they falling into different clusters?

If the memberships are roughly the same, the data fit well into k clusters. Otherwise, you try a different k.

Also, you could do PCA (principal component analysis) to reduce your 50 dimensions to some more tractable number. If a PCA run suggests that most of your variance is coming from, say, 4 out of the 50 dimensions, then you can pick k on that basis, to explore how the four cluster memberships are assigned.

这篇关于不知道簇数的Kmeans?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

不知道簇数的Kmeans? [英] Kmeans without knowing the number of clusters?

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

不知道簇数的Kmeans? [英] Kmeans without knowing the number of clusters?

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭