wordcloud 包:获取“strwidth(...) 错误:‘cex’值无效" [英] wordcloud package: get “Error in strwidth(…) : invalid 'cex' value”
问题描述
我在 R 2.15.1 中使用 tm 和 wordcloud 包.我正在尝试制作一个词云 这里是代码:
I am using the tm and wordcloud packages in R 2.15.1. I am trying to make a word cloud Here is the code:
maruti_tweets = userTimeline("Maruti_suzuki", n=1000,cainfo="cacert.pem")
hyundai_tweets = userTimeline("HyundaiIndia", n=1000,cainfo="cacert.pem")
tata_tweets = userTimeline("TataMotor", n=1000,cainfo="cacert.pem")
toyota_tweets = userTimeline("Toyota_India", n=1000,cainfo="cacert.pem")
# get text
maruti_txt = sapply(maruti_tweets, function(x) x$getText())
hyundai_txt = sapply(hyundai_tweets, function(x) x$getText())
tata_txt = sapply(tata_tweets, function(x) x$getText())
toyota_txt = sapply(toyota_tweets, function(x) x$getText())
clean.text = function(x)
{
# tolower
x = tolower(x)
# remove rt
x = gsub("rt", "", x)
# remove at
x = gsub("@\\w+", "", x)
# remove punctuation
x = gsub("[[:punct:]]", "", x)
# remove numbers
x = gsub("[[:digit:]]", "", x)
# remove links http
x = gsub("http\\w+", "", x)
# remove tabs
x = gsub("[ |\t]{2,}", "", x)
# remove blank spaces at the beginning
x = gsub("^ ", "", x)
# remove blank spaces at the end
x = gsub(" $", "", x)
return(x)
}
# clean texts
maruti_clean = clean.text(maruti_txt)
hyundai_clean = clean.text(hyundai_txt)
tata_clean = clean.text(tata_txt)
toyota_clean = clean.text(toyota_txt)
maruti = paste(maruti_clean, collapse=" ")
hyundai= paste(hyundai_clean, collapse=" ")
tata= paste(tata_clean, collapse=" ")
toyota= paste(toyota_clean, collapse=" ")
# put ehyundaiything in a single vector
all = c(maruti, hyundai, tata, toyota)
# remove stop-words
all = removeWords(all,
c(stopwords("english"), "maruti", "tata", "hyundai", "toyota"))
# create corpus
corpus = Corpus(VectorSource(all))
# create term-document matrix
tdm = TermDocumentMatrix(corpus)
# convert as matrix
tdm = as.matrix(tdm)
# add column names
colnames(tdm) = c("MARUTI", "HYUNDAI", "TATA", "TOYOTA")
# comparison cloud
comparison.cloud(tdm, random.order=FALSE,colors = c("#00B2FF", "red", #FF0099","#6600CC"),max.words=500)
但出现以下错误
Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value
please help
推荐答案
我在另一个应用程序中发现了空列问题,并抛出了相同的错误.就我而言,这是因为 removeSparseTerms
命令应用于文档术语矩阵.使用 str()
帮助我识别错误.
I spotted the empty-column issue in a different application throwing the same error. In my case it was because of the removeSparseTerms
command applied to a document term matrix. Using str()
helped me identify the bug.
输入变量(略有编辑)有 289 列:
The input variable (slightly edited) had 289 columns:
> str(corpus.dtm)
List of 6
$ i : int [1:443] 3 4 6 8 10 12 15 18 19 21 ...
$ j : int [1:443] 105 98 210 93 287 249 126 223 129 146 ...
$ v : num [1:443] 1 1 1 1 1 1 1 1 1 1 ...
$ nrow : int 1408
$ ncol : int 289
$ dimnames:List of 2
..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ...
..$ Terms: chr [1:289] "word1" "word2" "word3" "word4" ...
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency" "tf"
命令是:
removeSparseTerms(corpus.dtm,0.90)->corpus.dtm.frequent
结果有 0 列:
> str(corpus.dtm.frequent)
List of 6
$ i : int(0)
$ j : int(0)
$ v : num(0)
$ nrow : int 1408
$ ncol : int 0
$ dimnames:List of 2
..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ...
..$ Terms: NULL
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency" "tf"
将稀疏系数从 0.90 提高到 0.95 解决了这个问题.对于更冗长的文档,我提高了 0.999,以便在删除稀疏术语后得到非空结果.
Raising the sparsity coefficient from 0.90 to 0.95 solved the issue. For a wordier document I went up to 0.999 in order to have a non-empty result after removing the sparse terms.
出现此错误时检查空列是一件好事.
Empty columns are a good thing to check out when this error occurs.
这篇关于wordcloud 包:获取“strwidth(...) 错误:‘cex’值无效"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!