寻找2&使用R TM软件包的3个词组 [英] Finding 2 & 3 word Phrases Using R TM Package

查看：75 发布时间：2020/10/17 21:53:00 r data-mining text-mining

本文介绍了寻找2&使用R TM软件包的3个词组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试找到一个实际上可以在R文本挖掘程序包中找到最常用的两个和三个单词短语的代码（也许还有另一个我不知道的程序包）。我一直在尝试使用令牌生成器，但似乎没有运气。

I am trying to find a code that actually works to find the most frequently used two and three word phrases in R text mining package (maybe there is another package for it that I do not know). I have been trying to use the tokenizer, but seem to have no luck.

如果您以前曾在类似情况下工作过，是否可以发布经过测试且可以正常工作的代码？非常感谢！

If you worked on a similar situation in the past, could you post a code that is tested and actually works? Thank you so much!

推荐答案

您可以将自定义标记化函数传递给 tm 的 DocumentTermMatrix 函数，因此，如果您安装了软件包 tau ，它就非常简单。

You can pass in a custom tokenizing function to tm's DocumentTermMatrix function, so if you have package tau installed it's fairly straightforward.

library(tm); library(tau);

tokenize_ngrams <- function(x, n=3) return(rownames(as.data.frame(unclass(textcnt(x,method="string",n=n)))))

texts <- c("This is the first document.", "This is the second file.", "This is the third text.")
corpus <- Corpus(VectorSource(texts))
matrix <- DocumentTermMatrix(corpus,control=list(tokenize=tokenize_ngrams))

在 tokenize_ngrams 函数中的 n 是每个短语的单词数。此功能还可以在软件包 RTextTools 中实现，这可以进一步简化操作。

Where n in the tokenize_ngrams function is the number of words per phrase. This feature is also implemented in package RTextTools, which further simplifies things.

library(RTextTools)
texts <- c("This is the first document.", "This is the second file.", "This is the third text.")
matrix <- create_matrix(texts,ngramLength=3)

这将返回 DocumentTermMatrix 与软件包 tm 一起使用。

This returns a class of DocumentTermMatrix for use with package tm.

这篇关于寻找2&使用R TM软件包的3个词组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

寻找2&使用R TM软件包的3个词组 [英] Finding 2 & 3 word Phrases Using R TM Package

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

寻找2&amp;使用R TM软件包的3个词组 [英] Finding 2 &amp; 3 word Phrases Using R TM Package

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

寻找2&使用R TM软件包的3个词组 [英] Finding 2 & 3 word Phrases Using R TM Package

登录关闭