StemCompletion 中的 R 警告和 TermDocumentMatrix 中的错误 [英] R Warning in stemCompletion and error in TermDocumentMatrix
问题描述
我遵循了 这里
在幻灯片编号中.9 tolower 在 tm 0.6 及以上的包中有问题,我用过
In slide no. 9 tolower has issue in package tm 0.6 and above I have used
myCorpus <- tm_map(myCorpus, content_transformer(tolower)
它与此重复 stackoverflow但是我在运行 stemCompletion 时仍然出错
it was duplicate from this stackoverflow but i still get error when run stemCompletion
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
我按照这个说明将变量 myCorpus 和 myCorpusCopy 复制到 PlainTextDocument
And I follow this instruction for both variable myCorpus and myCorpusCopy to PlainTextDocument
corpus <- tm_map(corpus, PlainTextDocument)
我能够执行
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
但我收到了 50 条警告
but I get 50 warnings
有 50 个或更多警告(使用 warnings() 查看前 50 个)警告()
There were 50 or more warnings (use warnings() to see the first 50) warnings()
我收到了全部 50 条警告:
and I get all 50 warnings:
1: 在 grep(sprintf("^%s", w), dictionary, value = TRUE) 中:参数'pattern' 的长度 > 1 并且只使用第一个元素 2:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 3:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 4:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 5:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 6: 在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 7:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 8:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 9:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素 10:在grep(sprintf("^%s", w), dictionary, value = TRUE) : 参数模式"长度 > 1 并且只使用第一个元素
1: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 2: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 3: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 4: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 5: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 6: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 7: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 8: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 9: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 10: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used
我尝试忽略警告并创建 TermDocumentMatrix()
I try to ignore the warnings and create TermDocumentMatrix()
tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1,
Inf)))
我得到错误:
Error: inherits(doc, "TextDocument") is not TRUE
推荐答案
以下是创建词干术语文档矩阵并在之后重新完成词干标记的方法:
Here's how you can create a stemmed term-document-matrix and re-complete the stemmed tokens afterwards:
txt <- " was followed the instruction from here In slide no. 9 tolower has issue in package tm 0.6 and above I have used "
myCorpus <- Corpus(VectorSource(txt))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
tdm <- TermDocumentMatrix(myCorpus, control = list(stemming = TRUE))
cbind(stems = rownames(tdm), completed = stemCompletion(rownames(tdm), myCorpus))
# stems completed
# 0.6 "0.6" "0.6"
# abov "abov" "above"
# and "and" "and"
# follow "follow" "followed"
# from "from" "from"
# has "has" "has"
# have "have" "have"
# here "here" "here"
# instruct "instruct" "instruction"
# issu "issu" "issue"
# no. "no." "no."
# packag "packag" "package"
# slide "slide" "slide"
# the "the" "the"
# tolow "tolow" "tolower"
# use "use" "used"
# was "was" "was"
这篇关于StemCompletion 中的 R 警告和 TermDocumentMatrix 中的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!