Python丛集演算法 [英] Python Clustering Algorithms
问题描述
我一直在寻找scipy和sklearn,以寻找针对我遇到的特定问题的聚类算法.我需要某种方法来将N个粒子表征为k个组,其中k不一定是已知的,此外,还没有已知的先验链接长度(类似于此
I've been looking around scipy and sklearn for clustering algorithms for a particular problem I have. I need some way of characterizing a population of N particles into k groups, where k is not necessarily know, and in addition to this, no a priori linking lengths are known (similar to this question).
我尝试了kmeans,如果您 知道 想要多少个群集,该方法将非常有效.我尝试了dbscan,除非您 tell 能够停止寻找(或开始寻找)集群,否则它的性能很差.问题是,我可能有成千上万个这样的粒子簇,并且我不能花时间告诉kmeans/dbscan算法应该从什么开始.
I've tried kmeans, which works well if you know how many clusters you want. I've tried dbscan, which does poorly unless you tell it a characteristic length scale on which to stop looking (or start looking) for clusters. The problem is, I have potentially thousands of these clusters of particles, and I cannot spend the time to tell kmeans/dbscan algorithms what they should go off of.
以下是dbscan发现的示例:
Here is an example of what dbscan find:
您可以看到这里确实有两个单独的粒子群,尽管调整了epsilon因子(相邻簇之间的最大距离参数),但我根本无法看到这两个粒子群.
You can see that there really are two separate populations here, though adjusting the epsilon factor (the max. distance between neighboring clusters parameter), I simply cannot get it to see those two populations of particles.
还有其他适用于此的算法吗?我一直在寻找最少的信息-换句话说,我希望该算法能够对可能构成一个单独集群的问题做出明智"的决定.
Is there any other algorithms which would work here? I'm looking for minimal information upfront - in other words, I'd like the algorithm to be able to make "smart" decisions about what could constitute a separate cluster.
推荐答案
我找到了一个不需要先验信息/猜测并且可以很好地满足我要求的功能.它被称为平均移位和位于 SciKit-Learn 中.它也相对较快(与相似性传播"等其他算法相比).
I've found one that requires NO a priori information/guesses and does very well for what I'm asking it to do. It's called Mean Shift and is located in SciKit-Learn. It's also relatively quick (compared to other algorithms like Affinity Propagation).
这是它提供的示例:
我还想指出,文档中指出它可能无法很好地扩展.
I also want to point out that in the documentation is states that it may not scale well.
这篇关于Python丛集演算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!