使用R进行文本挖掘Reuters-21578 [英] Using R for Text Mining Reuters-21578

查看：268 发布时间：2020/7/31 5:29:25 r corpus tm reuters

本文介绍了使用R进行文本挖掘Reuters-21578的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试处理众所周知的Reuters-21578数据集，并且在将sgm文件加载到我的语料库时遇到了一些麻烦.

I am trying to do some work with the well known Reuters-21578 dataset and am having some trouble with loading the sgm files into my corpus.

现在我正在使用命令

require(tm)
reut21578 <- system.file("reuters21578", package = "tm")
reuters <-Corpus(DirSource(reut21578), 
    readerControl = list(reader = readReut21578XML))

试图将所有文件包含到我的语料库中，但这给了我以下错误:

In an attempt to include all the files into my corpus but this gives me the following error:

Error in DirSource(reut21578) : empty directory

知道我可能要去哪里哪里吗?

Any idea where I may be going wrong?

推荐答案

"tm"包仅包含Reuters21578数据的示例.如果要避免下载，加载和准备所有22个Reuters21578文件，则可以使用软件包"tm.corpus.Reuters21578":

The "tm" package includes only sample of the Reuters21578 data. If you want to avoid downloading, loading and preparing all the 22 Reuters21578 files, you can use package "tm.corpus.Reuters21578":

install.packages("tm.corpus.Reuters21578", repos = "http://datacube.wu.ac.at")
library(tm.corpus.Reuters21578)
data(Reuters21578)

这篇关于使用R进行文本挖掘Reuters-21578的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用R进行文本挖掘Reuters-21578 [英] Using R for Text Mining Reuters-21578

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用R进行文本挖掘Reuters-21578 [英] Using R for Text Mining Reuters-21578

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭