文本聚类的k均值 [英] k-means for text clustering

查看：90 发布时间：2020/4/26 10:25:45 algorithm k-means

本文介绍了文本聚类的k均值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试实现用于文本聚类的k-means，特别是英语句子.到目前为止，我对每个文档都有一个频率矩阵术语(句子).我对文本数据中k-means的实际实现有些困惑.这是我对它应该如何工作的猜测.

I'm trying to implement k-means for text clustering, specifically English sentences. So far I'm at the point where I have a term frequency matrix for each document (sentence). I'm a little confused on the actual implementation of k-means on text data. Here's my guess of how it should work.

计算出所有句子中唯一词的数量(数量很多，称为n).

创建k n维向量(簇)，并用一些随机数填充k向量的值(我如何确定这些数字的边界是什么?)

Create k n dimensional vectors (clusters) and fill in the values of the k vectors with some random numbers (how do I decide what the bounds for these numbers are?)

确定从每个q句子到随机k簇，重新定位簇等的欧几里得距离(如果n像英语一样大，则不会计算欧几里得这些向量的距离会非常昂贵吗?)

Determine the Euclidean distance from each of the q sentences to the random k clusters, reposition clusters, etc. (If n is very large like the English language, wouldn't calculating the Euclidean distance for these vectors be very costly?)

感谢您的见解！

文本聚类的k均值 [英] k-means for text clustering

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

文本聚类的k均值 [英] k-means for text clustering

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭