sklearn聚集集群:动态更新集群数量 [英] sklearn agglomerative clustering: dynamically updating the number of clusters

查看:104
本文介绍了sklearn聚集集群:动态更新集群数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

sklearn.cluster.AgglomerativeClustering的文档提到了这一点,

The documentation for sklearn.cluster.AgglomerativeClustering mentions that,

当更改集群数量并使用缓存时, 计算整个树可能是有利的.

when varying the number of clusters and using caching, it may be advantageous to compute the full tree.

这似乎意味着可以先计算完整的树,然后根据需要快速更新所需集群的数量,而无需重新计算树(使用缓存).

This seems to imply that it is possible to first compute the full tree, and then quickly update the number of desired clusters as necessary, without recomputing the tree (with caching).

但是,似乎没有记录此更改群集数的过程.我想这样做,但是不确定如何进行.

However this procedure for changing the number of clusters does not seem to be documented. I would like to do this but am unsure how to proceed.

更新:为明确起见,fit方法未将簇数作为输入: http://scikit-learn .org/stable/modules/generation/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering.fit

Update: To clarify, the fit method does not take number of clusters as an input: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering.fit

推荐答案

使用参数memory = 'mycachedir'设置缓存目录,然后如果使用compute_full_tree=True设置,则使用不同的n_clusters值重新运行fit时,它将使用缓存的树,而不是每次都重新计算.为您提供有关如何使用sklearn的gridsearch API进行此操作的示例:

You set a cacheing directory with the paramater memory = 'mycachedir' and then if you set compute_full_tree=True, when you rerun fit with different values of n_clusters, it will used the cached tree rather than recomputing each time. To give you an example of how to do this with sklearn's gridsearch API:

from sklearn.cluster import AgglomerativeClustering
from sklearn.grid_search import GridSearchCV

ac = AgglomerativeClustering(memory='mycachedir', 
                             compute_full_tree=True)
classifier = GridSearchCV(ac, 
                          {n_clusters: range(2,6)}, 
                          scoring = 'adjusted_rand_score', 
                          n_jobs=-1, verbose=2)
classifier.fit(X,y)

这篇关于sklearn聚集集群:动态更新集群数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆