R文本文件和文本挖掘...如何加载数据 [英] R text file and text mining...how to load data

查看:103
本文介绍了R文本文件和文本挖掘...如何加载数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R包tm,我想进行一些文本挖掘.这是一个文档,被视为一揽子文字.

I am using the R package tm and I want to do some text mining. This is one document and is treated as a bag of words.

我不了解有关如何加载文本文件以及创建必要的对象以开始使用...等功能的文档.

I don't understand the documentation on how to load a text file and to create the necessary objects to start using features such as....

stemDocument(x, language = map_IETF(Language(x)))

因此,假设这是我的文档这是对R负载的测试"

So assume that this is my doc "this is a test for R load"

如何加载用于文本处理和创建对象x的数据?

How do I load the data for text processing and to create the object x?

推荐答案

就像@richiemorrisroe一样,我发现此文献记录不充分.这是我将文本输入到tm包中并使其成为文档术语矩阵的方式:

Like @richiemorrisroe I found this poorly documented. Here's how I get my text in to use with the tm package and make the document term matrix:

library(tm) #load text mining library
setwd('F:/My Documents/My texts') #sets R's working directory to near where my files are
a  <-Corpus(DirSource("/My Documents/My texts"), readerControl = list(language="lat")) #specifies the exact folder where my text file(s) is for analysis with tm.
summary(a)  #check what went in
a <- tm_map(a, removeNumbers)
a <- tm_map(a, removePunctuation)
a <- tm_map(a , stripWhitespace)
a <- tm_map(a, tolower)
a <- tm_map(a, removeWords, stopwords("english")) # this stopword file is at C:\Users\[username]\Documents\R\win-library\2.13\tm\stopwords 
a <- tm_map(a, stemDocument, language = "english")
adtm <-DocumentTermMatrix(a) 
adtm <- removeSparseTerms(adtm, 0.75)

在这种情况下,您无需指定确切的文件名.只要它是第3行中提到的目录中的唯一目录,它就会被tm函数使用.我这样做是因为在第3行中指定文件名没有成功.

In this case you don't need to specify the exact file name. So long as it's the only one in the directory referred to in line 3, it will be used by the tm functions. I do it this way because I have not had any success in specifying the file name in line 3.

如果有人可以建议如何将文本添加到lda软件包中,我将不胜感激.我根本无法解决这个问题.

If anyone can suggest how to get text into the lda package I'd be most grateful. I haven't been able to work that out at all.

这篇关于R文本文件和文本挖掘...如何加载数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆