文件丛集基础 [英] Document Clustering Basics

查看：54 发布时间：2020/4/26 10:22:38 cluster-analysis document k-means

本文介绍了文件丛集基础的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

因此，我已经考虑了这些概念一段时间了，我的理解是非常基础的.信息检索似乎是一个很少涉及的话题...

So, I've been mulling over these concepts for some time, and my understanding is very basic. Information retrieval seems to be a topic seldom covered in the wild...

我的问题来自文档聚类的过程.假设我从一组仅包含有趣单词的文档开始.第一步是什么?解析每个文档中的单词并创建一个巨大的单词袋"类型模型?然后，我是否要继续为每个文档创建单词计数向量?如何使用K-means聚类比较这些文档?

My questions stem from the process of clustering documents. Let's say I start off with a collection of documents containing only interesting words. What is the first step here? Parse the words from each document and create a giant 'bag-of-words' type model? Do I then proceed to create vectors of word counts for each document? How do I compare these documents using something like the K-means clustering?

文件丛集基础 [英] Document Clustering Basics

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

文件丛集基础 [英] Document Clustering Basics

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭