根据特定的兴趣词绘制高度相关的词 [英] Plot highly correlated words against a specific word of interest

查看：61 发布时间：2020/11/20 19:07:59 r graphviz tm

本文介绍了根据特定的兴趣词绘制高度相关的词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试绘制单词的最高相关性.例如，我想绘制单词"whale"的最高十个相关性.有人可以为我提供类似命令的帮助吗?如果有帮助，我已经安装了RGraphViz.

I am trying to plot the highest correlation of a word. For example, I want to graph the highest ten correlations of the word "whale." Can someone help me with the command for something like that? I have RGraphViz installed if that helps.

s.dir1<-"/PATHTOTEXT/MobyDickTxt"

s.cor1<-Corpus(DirSource(s.dir1), readerControl=list(reader=readPlain))
s.cor1<-tm_map(s.cor1, removePunctuation)
s.cor1<-tm_map(s.cor1, stripWhitespace)
s.cor1<-tm_map(s.cor1, tolower)
s.cor1<-tm_map(s.cor1, removeNumbers)
s.cor1<-tm_map(s.cor1, removeWords, stopwords("english"))
tdm1 <- TermDocumentMatrix(s.cor1)

m1 <- as.matrix(tdm)
v1 <- sort(rowSums(m), decreasing=TRUE)
d1 <- data.frame(word = names(v),freq=v)

推荐答案

这是一种方法，用于计算与语料库中与给定单词相关的最重要单词，并绘制这些单词和相关性.

Here's a method to compute the top words correlating with a given word in a corpus, and plot those words and correlations.

获取示例数据...

require(tm)
data("crude")
tdm <- TermDocumentMatrix(crude)

计算相关性并存储在数据框中...

Compute correlations and store in data frame...

toi <- "oil" # term of interest
corlimit <- 0.7 #  lower correlation bound limit.
oil_0.7 <- data.frame(corr = findAssocs(tdm, toi, corlimit)[[1]],
                  terms = names(findAssocs(tdm, toi, corlimit)[[1]]))

创建一个因素以允许ggplot对数据框进行排序...

Create a factor to allow ggplot to sort the dataframe...

oil_0.7$terms <- factor(oil_0.7$terms ,levels = oil_0.7$terms)

绘制情节...

require(ggplot2)
ggplot(oil_0.7, aes( y = terms  ) ) +
  geom_point(aes(x = corr), data = oil_0.7) +
  xlab(paste0("Correlation with the term ", "\"", toi, "\""))

这篇关于根据特定的兴趣词绘制高度相关的词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据特定的兴趣词绘制高度相关的词 [英] Plot highly correlated words against a specific word of interest

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据特定的兴趣词绘制高度相关的词 [英] Plot highly correlated words against a specific word of interest

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭