Python丛集演算法 [英] Python Clustering Algorithms

查看:94
本文介绍了Python丛集演算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找scipy和sklearn,以寻找针对我遇到的特定问题的聚类算法.我需要某种方法来将N个粒子表征为k个组,其中k不一定是已知的,此外,还没有已知的先验链接长度(类似于此

I've been looking around scipy and sklearn for clustering algorithms for a particular problem I have. I need some way of characterizing a population of N particles into k groups, where k is not necessarily know, and in addition to this, no a priori linking lengths are known (similar to this question).

我尝试了kmeans,如果您 知道 想要多少个群集,该方法将非常有效.我尝试了dbscan,除非您 tell 能够停止寻找(或开始寻找)集群,否则它的性能很差.问题是,我可能有成千上万个这样的粒子簇,并且我不能花时间告诉kmeans/dbscan算法应该从什么开始.

I've tried kmeans, which works well if you know how many clusters you want. I've tried dbscan, which does poorly unless you tell it a characteristic length scale on which to stop looking (or start looking) for clusters. The problem is, I have potentially thousands of these clusters of particles, and I cannot spend the time to tell kmeans/dbscan algorithms what they should go off of.

以下是dbscan发现的示例:

Here is an example of what dbscan find:

您可以看到这里确实有两个单独的粒子群,尽管调整了epsilon因子(相邻簇之间的最大距离参数),但我根本无法看到这两个粒子群.

You can see that there really are two separate populations here, though adjusting the epsilon factor (the max. distance between neighboring clusters parameter), I simply cannot get it to see those two populations of particles.

还有其他适用于此的算法吗?我一直在寻找最少的信息-换句话说,我希望该算法能够对可能构成一个单独集群的问题做出明智"的决定.

Is there any other algorithms which would work here? I'm looking for minimal information upfront - in other words, I'd like the algorithm to be able to make "smart" decisions about what could constitute a separate cluster.

推荐答案

我找到了一个不需要先验信息/猜测并且可以很好地满足我要求的功能.它被称为平均移位和位于 SciKit-Learn 中.它也相对较快(与相似性传播"等其他算法相比).

I've found one that requires NO a priori information/guesses and does very well for what I'm asking it to do. It's called Mean Shift and is located in SciKit-Learn. It's also relatively quick (compared to other algorithms like Affinity Propagation).

这是它提供的示例:

我还想指出,文档中指出它可能无法很好地扩展.

I also want to point out that in the documentation is states that it may not scale well.

这篇关于Python丛集演算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆