在 R 中的 Wordcloud 中使所有单词大写 [英] Make all words uppercase in Wordcloud in R

查看:31
本文介绍了在 R 中的 Wordcloud 中使所有单词大写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在创建 Wordcloud 时,最常见的做法是将所有单词都设为小写.但是,我希望 wordclouds 显示大写的单词.强制单词大写后,wordcloud 仍然显示小写单词.任何想法为什么?

When creating Wordclouds it is most common to make all the words lowercase. However, I want the wordclouds to display the words uppercase. After forcing the words to be uppercase the wordcloud still display lowercase words. Any ideas why?

可重现的代码:

    library(tm)
    library(wordcloud)

data <- data.frame(text = c("Creativity is the art of being ‘productive’ by using
          the available resources in a skillful manner. 
          Scientifically speaking, creativity is part of
          our consciousness and we can be creative –
          if we know – ’what goes on in our mind during
          the process of creation’.
          Let us now look at 6 examples of creativity which blows the mind."))

text <- paste(data$text, collapse = " ")

# I am using toupper() to force the words to become uppercase.
text <- toupper(text)

source <- VectorSource(text)
corpus <- VCorpus(source, list(language = "en"))

# This is my function for cleaning the text                  
clean_corpus <- function(corpus){
             corpus <- tm_map(corpus, removePunctuation)
             corpus <- tm_map(corpus, removeNumbers)
             corpus <- tm_map(corpus, stripWhitespace)
             corpus <- tm_map(corpus, removeWords, c(stopwords("en")))
             return(corpus)
}   

clean_corp <- clean_corpus(corpus)
data_tdm <- TermDocumentMatrix(clean_corp)
data_m <- as.matrix(data_tdm)

commonality.cloud(data_m, colors = c("#224768", "#ffc000"), max.words = 50)

这会产生以下输出

推荐答案

这是因为 TermDocumentMatrix(clean_corp) 在幕后做了 TermDocumentMatrix(clean_corp, control = list(tolower = TRUE)).如果将其设置为 TermDocumentMatrix(clean_corp, control = list(tolower = FALSE)),则单词保持大写.或者,您也可以在之后调整矩阵的行名称:rownames(data_m) <- toupper(rownames(data_m)).

It's because behind the scenes TermDocumentMatrix(clean_corp) is doing TermDocumentMatrix(clean_corp, control = list(tolower = TRUE)). If you set it to TermDocumentMatrix(clean_corp, control = list(tolower = FALSE)), then the words stay uppercase. Alternatively, you can also adjust the row names of your matrix afterwards: rownames(data_m) <- toupper(rownames(data_m)).

这篇关于在 R 中的 Wordcloud 中使所有单词大写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆