在 R 中按频率排列 Document Term Matrix 的单词 [英] Arrange the words of the Document Term Matrix by frequency in R

查看：23 发布时间：2021/9/8 20:10:25 r tm

本文介绍了在 R 中按频率排列 Document Term Matrix 的单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我很抱歉有新问题，但我是文本挖掘的新手，需要专业人士的建议.现在，经过 content_transformer 的长期折磨，我有干净的语料库下一个问题

i'm sorry for new question , but i newbie in text mining, and need in advices of profy. Now, after long torments with content_transformer i have clean corpus The next question

1. How  select from `dtm`  the words with small frequencies , so that the amount of frequencies was not more than 1%

例如我需要这种格式

x 0,5% of all words in the dataset
y 0,2%
z 0,3%

所以这里总频率总和 =1%这是怎么做的?

so here total frequencies sum =1% How do this?

推荐答案

您可以查看 tm 包的 termDocumentMatrix 函数.这包含一种计算每个文档单词出现次数的方法.将这些数字添加到整个语料库中应该会引导您到达您想要的位置.

You can take a look into the termDocumentMatrix function of the tm package. This contains a way to count the occurrences of the words per document. Adding these numbers over the total corpus should lead you where you want to be.

dtm <- DocumentTermMatrix(corpus)
# wordcounts for complete corpus
counts <- colSums(as.matrix(dtm))

# number of documents
nb <- length(counts)
# frequencies
freqs <- counts / nb

这篇关于在 R 中按频率排列 Document Term Matrix 的单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 R 中按频率排列 Document Term Matrix 的单词 [英] Arrange the words of the Document Term Matrix by frequency in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 R 中按频率排列 Document Term Matrix 的单词 [英] Arrange the words of the Document Term Matrix by frequency in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭