获取DBSCAN的聚集文档 [英] Obtain the Clustered Documents of DBSCAN

查看:88
本文介绍了获取DBSCAN的聚集文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用DBSCAN(来自scikit-learn)对文本文档进行聚类.我使用TF-IDF(sklearn中的TfidfVectorizer)创建每个文档的功能.

I attempted to use DBSCAN (from scikit-learn) to cluster text documents. I use TF-IDF (TfidfVectorizer in sklearn) to create the feature of each document.

但是,我还没有找到一种方法来获取(打印)由DBSCAN聚集的文档.

However, I have not found a way to obtain (print) the documents that are clustered by DBSCAN.

sklearn中的DBSCAN提供了一个名为"labels_"的属性,该属性使我们能够获取集群组标签(例如,噪声为1、2、3,-1).但是,我想获取由DBSCAN集群的文档,而不是集群组标签.

The DBSCAN in sklearn, provides an attribute called 'labels_' which allows us to get the cluster group labels (e.g. 1, 2, 3, -1 for noise). But, I want to get the documents that are clustered by DBSCAN, instead of the cluster group labels.

为了强调,我想知道每个集群属于哪些文档. 您能提出一些建议的方法吗?

To emphasize, I want to know what documents that belong to each cluster. Could you please suggest ways to do this?

非常感谢!

推荐答案

使用标签选择文档.

X[labels_ == 1,:]

应该是群集1中的所有文档.

Should be all documents in cluster 1.

这篇关于获取DBSCAN的聚集文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆