R,tm转换错误删除文档 [英] R, tm-error of transformation drops documents
问题描述
我想根据文字中关键字的权重创建一个网络.然后在运行与tm_map相关的代码时出现错误:
I want to create a network based on the weight of keywords from text. Then I got an error when running the codes related to tm_map:
library (tm)
library(NLP)
lirary (openNLP)
text = c('.......')
corp <- Corpus(VectorSource(text))
corp <- tm_map(corp, stripWhitespace)
Warning message:
In tm_map.SimpleCorpus(corp, stripWhitespace) :
transformation drops documents
corp <- tm_map(corp, tolower)
Warning message:
In tm_map.SimpleCorpus(corp, tolower) : transformation drops documents
代码在2个月前开始工作,现在我正在尝试获取新数据,但现在不再工作.有人请告诉我我哪里错了.谢谢你. 我什至尝试使用下面的命令,但是它也不起作用.
The codes were working 2 months ago, now I'm trying for a new data and it is not working anymore. Anyone please shows me where was I wrong. Thank you. I even tried with the command below, but it doesn't work either.
corp <- tm_map(corp, content_transformer(stripWhitespace))
推荐答案
该代码仍应正常工作.您会得到警告,而不是错误.仅当使用语料库而不是VCorpus时,只有结合使用VectorSource的语料库时,才会出现此警告.
The code should still be working. You get a warning, not an error. This warning only appears when you have a corpus based on a VectorSource in combination when you use Corpus instead of VCorpus.
原因是在基础代码中进行了检查,以查看语料库内容的名称数量是否与语料库内容的长度匹配.将文本作为矢量读取时,没有文档名称,并且会弹出此警告.这只是一个警告,没有文档被丢弃.
The reason is that there is a check in the underlying code to see if the number of names of the corpus content matches the length of the corpus content. With reading the text as a vector there are no document names and this warning pops up. And this is only a warning, no documents have been dropped.
查看两个示例之间的区别
See the difference between the 2 examples
library(tm)
text <- c("this is my text with some other text and some more")
# warning based on Corpus and Vectorsource
text_corpus <- Corpus(VectorSource(text))
# warning appears running following line
tm_map(text_corpus, content_transformer(tolower))
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 1
Warning message:
In tm_map.SimpleCorpus(text_corpus, content_transformer(tolower)) :
transformation drops documents
# Using VCorpus
text_corpus <- VCorpus(VectorSource(text))
# warning doesn't appear
tm_map(text_corpus, content_transformer(tolower))
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 1
tm_map(text_corpus, content_transformer(tolower))
这篇关于R,tm转换错误删除文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!