在已知簇数及其大小的情况下如何与K均值聚类 [英] How to cluster with K-means, when number of clusters and their sizes are known
问题描述
我正在使用scikit集群一些数据.
I'm clustering some data using scikit.
我有最简单的任务:我确实知道集群的数量.而且,我知道每个群集的大小.是否可以指定此信息并将其中继到K-means函数?
I have the easiest possible task: I do know the number of clusters. And, I do know the size of each cluster. Is it possible to specify this information and relay it to the K-means function?
推荐答案
它不再是k均值了.
K均值是方差最小化,看来您的目标是产生预定义大小的分区,而不是最小方差.
It won't be k-means anymore.
K-means is variance minimization, and it seems your objective is to produce paritions of a predefined size, not of minimum variance.
但是,这是一个教程,其中显示了如何修改k -表示产生相同大小的簇.您可以轻松地扩展它以产生所需大小的簇,而不是平均大小.用这种方法修改k均值是相当容易的.但是,结果将比大多数数据集上的k均值结果更无意义. K均值通常与随机凸分区一样好.
However, here is a tutorial that shows how to modify k-means to produce clusters of the same size. You can easily extend this to produce clusters of the desired sizes instead of the average size. It's fairly easy to modify k-means this way. But the results will be even more meaningless than k-means results on most data sets. K-means is often just as good as random convex partitions.
这篇关于在已知簇数及其大小的情况下如何与K均值聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!