获取DBSCAN的聚集文档 [英] Obtain the Clustered Documents of DBSCAN
问题描述
我尝试使用DBSCAN(来自scikit-learn)对文本文档进行聚类.我使用TF-IDF(sklearn中的TfidfVectorizer)创建每个文档的功能.
I attempted to use DBSCAN (from scikit-learn) to cluster text documents. I use TF-IDF (TfidfVectorizer in sklearn) to create the feature of each document.
但是,我还没有找到一种方法来获取(打印)由DBSCAN聚集的文档.
However, I have not found a way to obtain (print) the documents that are clustered by DBSCAN.
sklearn中的DBSCAN提供了一个名为"labels_"的属性,该属性使我们能够获取集群组标签(例如,噪声为1、2、3,-1).但是,我想获取由DBSCAN集群的文档,而不是集群组标签.
The DBSCAN in sklearn, provides an attribute called 'labels_' which allows us to get the cluster group labels (e.g. 1, 2, 3, -1 for noise). But, I want to get the documents that are clustered by DBSCAN, instead of the cluster group labels.
为了强调,我想知道每个集群属于哪些文档. 您能提出一些建议的方法吗?
To emphasize, I want to know what documents that belong to each cluster. Could you please suggest ways to do this?
非常感谢!
推荐答案
使用标签选择文档.
X[labels_ == 1,:]
应该是群集1中的所有文档.
Should be all documents in cluster 1.
这篇关于获取DBSCAN的聚集文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!