使用“TermDocumentMatrix"时出错和“Dist"R中的函数 [英] Error using "TermDocumentMatrix" and "Dist" functions in R

查看:136
本文介绍了使用“TermDocumentMatrix"时出错和“Dist"R中的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试复制示例 此处:但我在此过程中遇到了一些问题.

直到这里一切正常:

docsTDM <- TermDocumentMatrix(docs8)

<块引用>

UseMethod("meta", x) 中的错误:没有适用于元"的方法应用于字符"类的对象
另外:警告信息:
在 mclapply(unname(content(x)), termFreq, control) 中:
所有调度的内核都遇到了用户代码中的错误

所以我能够通过改变这个来修正这个错误:

docs8 <- tm_map(docs7, tolower)

为此:

docs8 <- tm_map(docs7, content_transformer(tolower))

但后来我又遇到了麻烦:

docsdissim <- dissimilarity(docsTDM, method = "cosine")

<块引用>

错误:找不到函数不同"

然后我了解到dissimilarity"函数被替换为dist函数,所以我这样做了:

docsdissim <- dist(docsTDM, method = "cosine")

<块引用>

crossprod(x, y)/sqrt(crossprod(x) * crossprod(y)) 中的错误:不一致的数组

这就是我卡住的地方.

顺便说一下,我的 R 版本是:

<块引用>

R 版本 3.2.2 (2015-08-14) 在 CentOS 7 上运行

解决方案

change

docsdissim <- proxy::dist(docsTDM, method = "cosine")

docsdissim <- dist(as.matrix(docsTDM), method = "cosine")

dist 需要一个数字矩阵、数据框或dist"对象和事件作为输入,尽管 termdocumentmatrix 是一个矩阵,但需要在此处进行转换.

I have been trying to replicate the example here: but I have had some problems along the way.

Everything worked fine until here:

docsTDM <- TermDocumentMatrix(docs8)

Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character"
In addition: Warning message:
In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code

So I was able to fix that error modifying this previous step by changing this:

docs8 <- tm_map(docs7, tolower)

To this:

docs8 <- tm_map(docs7, content_transformer(tolower))

But then I got in trouble again with:

docsdissim <- dissimilarity(docsTDM, method = "cosine")

Error: could not find function "dissimilarity"

Then I learned that the "dissimilarity" function was replaced by the dist function, so I did:

docsdissim <- dist(docsTDM, method = "cosine")

Error in crossprod(x, y)/sqrt(crossprod(x) * crossprod(y)) : non-conformable arrays

And there is where I'm stuck.

By the way, my R version is :

R version 3.2.2 (2015-08-14) running on CentOS 7

解决方案

change

docsdissim <- proxy::dist(docsTDM, method = "cosine")

to

docsdissim <- dist(as.matrix(docsTDM), method = "cosine")

dist requires as input a numeric matrix, data frame or "dist" object and event though a termdocumentmatrix is a matrix, it needs to be transformed here.

这篇关于使用“TermDocumentMatrix"时出错和“Dist"R中的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆