在Python中使用Scipy层次结构聚类的文本聚类 [英] Text clustering using Scipy Hierarchy Clustering in Python

查看：347 发布时间：2020/10/3 2:02:37 python scipy cluster-analysis text-mining

本文介绍了在Python中使用Scipy层次结构聚类的文本聚类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文本语料库，每行包含1000多篇文章。我正在尝试在python中使用通过Scipy使用的层次结构聚类来生成相关文章的聚类。
这是我用来进行聚类的代码

I have a text corpus that contains 1000+ articles each in a separate line. I am trying to use Hierarchy Clustering using Scipy in python to produce clusters of related articles. This is the code I used to do the clustering

# Agglomerative Clustering
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as hac
tree = hac.linkage(X.toarray(), method="complete",metric="euclidean")
plt.clf()
hac.dendrogram(tree)
plt.show()

得到了这个图

and I got this plot

然后我用scipy.cluster.hierarchy import fcluster
群集的fcluster（）

Then I cut off the tree at the third level with fcluster()

from scipy.cluster.hierarchy import fcluster
clustering = fcluster(tree,3,'maxclust')
print(clustering)

和我得到了这样的输出：
[2 2 2 ...，2 2 2]

and I got this output: [2 2 2 ..., 2 2 2]

我的问题是如何找到每个群集中的前10个常用词为了为每个群集建议一个主题？

My question is how can I find the top 10 frequent words in each cluster in order to suggest a topic for each cluster?

在Python中使用Scipy层次结构聚类的文本聚类 [英] Text clustering using Scipy Hierarchy Clustering in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Python中使用Scipy层次结构聚类的文本聚类 [英] Text clustering using Scipy Hierarchy Clustering in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭