如何在R中使用stemDocument? [英] how to use stemDocument in R?

查看:135
本文介绍了如何在R中使用stemDocument?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:

感谢您的帮助.检查评论.由于软件包的版本,我删除了tolower,它可以工作了.我只需要找到另一种降低它的方法.

Thanks for help. Check comments. Because of package version, I delete the tolower and it works. I just need to find another way to make it lower.

============= 我正在使用文档列表进行基本的txt挖掘,一切正常,直到尝试使用stemmDocument.

============ I am doing basic txt mining in with a list of document, everything goes on fine till I try to use stemmDocument.

我已经完成的tm_maplibrary(tm)

fbVec<-VectorSource(data[,1])
fbCorpus<-Corpus(fb.vec)
fbCorpus <- tm_map(fbCorpus, tolower)
fbCorpus <- tm_map(fbCorpus, removePunctuation)
fbCorpus <- tm_map(fbCorpus, removeNumbers)
fbCorpus <- tm_map(fbCorpus, removeWords, stopwords("english"))
fbCorpus <- tm_map(fbCorpus, removeWords, "pr")
fbCorpus <- tm_map(fbCorpus, stripWhitespace)

结果如下

[[1]]
[1]  easy post position search resumes improvement searching resumes

[[2]]
[1]  easy use good candidiates improvement allow multiple emails sent 

[[3]]
[1]  applicants young kids absolutely sales experience waste time looking improvement applicants apply experience looking dont need kids just high school

[[4]]
[1]  abundance resumes

然后我试图阻止

library(SnowballC)    
fbCorpus <- tm_map(fbCorpus, stemDocument)

但是结果却不像我想象的那样,看起来只处理了句子中的最后一个单词,结果如下:

But the results is not as I image, it looks like only deal with the last word in a sentence, result as following:

[[1]]
[1]  easy post position search resumes improvement searching resum

[[2]]
[1]  easy use good candidiates improvement allow multiple emails sent 

[[3]]
[1]  applicants young kids absolutely sales experience waste time looking improvement applicants apply experience looking dont need kids just high school

[[4]]
[1]  abundance resum

有没有人可以帮助您?

推荐答案

此问题出现在tm 0.6中,与使用tm的getTransformation()列表中未包含的功能有关.问题是tolower仅返回一个字符向量,而不是像tm_map这样的"PlainTextDocument". tm软件包提供了content_transformer函数,用于管理PlainTextDocument

This problem appears in tm 0.6 and has to do with using functions that are not in the list of getTransformation() from tm. The problem is that tolower just returns a character vector, and not a "PlainTextDocument" like tm_map would like. The tm packages provides the content_transformer function to take care of managing the PlainTextDocument

fbCorpus  <- tm_map(fbCorpus, content_transformer(tolower))

这篇关于如何在R中使用stemDocument?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆