聚类问题 [英] Clustering problem

查看:183
本文介绍了聚类问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在负责寻找含有最分给各专题组通过具有一定规模的界定有一定的数据组N集群。目前,我试图通过封孔在我的数据转换成kd树,遍历数据和查找其近邻,然后合并分,如果它们使群集不超过限制做到这一点。我不知道这种做法会给我一个全球性解决方案,所以我在寻找方法来调整它。如果你能告诉我是什么类型的问题,这将去下,那简直太好了。

I've been tasked to find N clusters containing the most points for a certain data set given that the clusters are bounded by a certain size. Currently, I am attempting to do this by plugging in my data into a kd-tree, iterating over the data and finding its nearest neighbor, and then merging the points if the cluster they make does not exceed a limit. I'm not sure this approach will give me a global solution so I'm looking for ways to tweak it. If you can tell me what type of problem this would go under, that'd be great too.

推荐答案

查看 scipy.clustering 一开始。那么关键词的搜索可以提供很多信息对所使用的有不同的算法。集群是一个很大的领域,有很多的研究和实际应用,以及一些已经找到工作得相当好简单的方法,所以你可能不希望启动滚动你自己的。

Check out scipy.clustering for a start. Key word searches can then give a lot of info on the different algorithms that are used there. Clustering is a big field, with a lot of research and practical applications, and a number of simple approaches that have been found to work fairly well, so you may not want to start by rolling your own.

这表示,聚类算法一般都是很容易的程序,如果你想编写你自己的,K-手段和凝聚的聚集是一些很快做的最爱。

This said, clustering algorithms are generally fairly easy to program, and if you do want to program your own, k-means and agglomerative clustering are some of the favorites that are quick to do.

最后,我不知道你是由具有一定规模的有界确切的N集群的思路是自洽的,但它完全取决于你所说的规模和集群的意思是(是单点集群?)。

Finally, I'm not sure that your idea of exactly N clusters that are bounded by a certain size is self-consistent, but it depends on exactly what you mean by "size" and "cluster" (are single points a cluster?).

更新:

按照下面的任择议定书的意见,我认为标准的聚类方法不会给这个问题的最佳解决方案,因为这不是一个连续的度量点之间的距离,可以进行优化。尽管它们可能给在某些情况下良好的解决方案,或近似。对于集群的方法,我想尝试K-均值,因为这种方法的premise是有一个固定的ñ。

Following the OP's comments below, I think that the standard clustering methods won't give an optimal solution to this problem because there's not a continuous metric for the "distance" between points that can be optimized. Although they may give a good solution or approximation in some cases. For a clustering approach I'd try k-means since the premise of this method is having a fixed N.

而不是集群,这似乎更像是一个覆盖问题的,你有固定的N个大小为矩形和你想覆盖所有与他们分),但我不知道很多关于这些,所以我会离开它给别人。

But instead of clustering, this seems more like a covering problem (i.e., you have N rectangles of fixed size and you're trying to cover all of the points with them), but I don't know much about these, so I'll leave it to someone else.

这篇关于聚类问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆