在已知簇数及其大小的情况下如何与K均值聚类 [英] How to cluster with K-means, when number of clusters and their sizes are known

查看:109
本文介绍了在已知簇数及其大小的情况下如何与K均值聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用scikit集群一些数据.

I'm clustering some data using scikit.

我有最简单的任务:我确实知道集群的数量.而且,我知道每个群集的大小.是否可以指定此信息并将其中继到K-means函数?

I have the easiest possible task: I do know the number of clusters. And, I do know the size of each cluster. Is it possible to specify this information and relay it to the K-means function?

推荐答案

它不再是k均值了.

K均值是方差最小化,看来您的目标是产生预定义大小的分区,而不是最小方差.

It won't be k-means anymore.

K-means is variance minimization, and it seems your objective is to produce paritions of a predefined size, not of minimum variance.

但是,这是一个教程,其中显示了如何修改k -表示产生相同大小的簇.您可以轻松地扩展它以产生所需大小的簇,而不是平均大小.用这种方法修改k均值是相当容易的.但是,结果将比大多数数据集上的k均值结果更无意义. K均值通常与随机凸分区一样好.

However, here is a tutorial that shows how to modify k-means to produce clusters of the same size. You can easily extend this to produce clusters of the desired sizes instead of the average size. It's fairly easy to modify k-means this way. But the results will be even more meaningless than k-means results on most data sets. K-means is often just as good as random convex partitions.

这篇关于在已知簇数及其大小的情况下如何与K均值聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆