StemCompletion 中的 R 警告和 TermDocumentMatrix 中的错误 [英] R Warning in stemCompletion and error in TermDocumentMatrix

查看:26
本文介绍了StemCompletion 中的 R 警告和 TermDocumentMatrix 中的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遵循了 这里

在幻灯片编号中.9 tolower 在 tm 0.6 及以上的包中有问题,我用过

In slide no. 9 tolower has issue in package tm 0.6 and above I have used

myCorpus <- tm_map(myCorpus, content_transformer(tolower)

它与此重复 stackoverflow但是我在运行 stemCompletion 时仍然出错

it was duplicate from this stackoverflow but i still get error when run stemCompletion

myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)

我按照这个说明将变量 myCorpus 和 myCorpusCopy 复制到 PlainTextDocument

And I follow this instruction for both variable myCorpus and myCorpusCopy to PlainTextDocument

corpus <- tm_map(corpus, PlainTextDocument)

我能够执行

myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)

但我收到了 50 条警告

but I get 50 warnings

有 50 个或更多警告(使用 warnings() 查看前 50 个)警告()

There were 50 or more warnings (use warnings() to see the first 50) warnings()

我收到了全部 50 条警告:

and I get all 50 warnings:

1: 在 grep(sprintf("^%s", w), dictionary, value = TRUE) 中:参数'pattern' 的长度 > 1 并且只使用第一个元素 2:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 3:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 4:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 5:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 6: 在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 7:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 8:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 9:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 10:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素

1: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 2: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 3: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 4: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 5: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 6: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 7: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 8: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 9: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 10: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used

我尝试忽略警告并创建 TermDocumentMatrix()

I try to ignore the warnings and create TermDocumentMatrix()

tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1,   
Inf)))

我得到错误:

Error: inherits(doc, "TextDocument") is not TRUE

推荐答案

以下是创建词干术语文档矩阵并在之后重新完成词干标记的方法:

Here's how you can create a stemmed term-document-matrix and re-complete the stemmed tokens afterwards:

txt <- " was followed the instruction from here In slide no. 9 tolower has issue in package tm 0.6 and above I have used "
myCorpus <- Corpus(VectorSource(txt))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
tdm <- TermDocumentMatrix(myCorpus, control = list(stemming = TRUE)) 
cbind(stems = rownames(tdm), completed = stemCompletion(rownames(tdm), myCorpus))  
#          stems      completed    
# 0.6      "0.6"      "0.6"        
# abov     "abov"     "above"      
# and      "and"      "and"        
# follow   "follow"   "followed"   
# from     "from"     "from"       
# has      "has"      "has"        
# have     "have"     "have"       
# here     "here"     "here"       
# instruct "instruct" "instruction"
# issu     "issu"     "issue"      
# no.      "no."      "no."        
# packag   "packag"   "package"    
# slide    "slide"    "slide"      
# the      "the"      "the"        
# tolow    "tolow"    "tolower"    
# use      "use"      "used"       
# was      "was"      "was"    

这篇关于StemCompletion 中的 R 警告和 TermDocumentMatrix 中的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆