在R中使用TM软件包的VCorpus时遇到错误 [英] Error faced while using TM package's VCorpus in R

查看：144 发布时间：2020/5/18 0:42:24 r text-mining tm text-analysis

本文介绍了在R中使用TM软件包的VCorpus时遇到错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在使用R处理TM软件包时，我遇到以下错误.

I am facing the below error while working on the TM package with R.

library("tm")
Loading required package: NLP
Warning messages:
1: package ‘tm’ was built under R version 3.4.2 
2: package ‘NLP’ was built under R version 3.4.1

corpus <- VCorpus(DataframeSource(data))

错误:all(！is.na(match(c("doc_id"，"text")，names(x))))不正确

Error: all(!is.na(match(c("doc_id", "text"), names(x)))) is not TRUE

尝试了多种方法，例如重新安装软件包，使用R的新版本进行更新，但错误仍然存在.对于相同的数据文件，相同的代码在具有相同R版本的另一个系统上运行.

Have tried various ways like reinstalling the package, updating with new version of R but the error still persists. For the same data file the same code runs on another system with the same version of R.

推荐答案

我将tm软件包更新为0.7-2版本时遇到了同样的问题. 我查找了DataframeSource()的详细信息，它提到了:

I met the same problem when I updated the tm package to 0.7-2 version. I looked for details of DataframeSource(), it mentioned:

第一列必须命名为"doc_id"，并且每个文档均包含唯一的字符串标识符.第二列必须命名为文本".

The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text".

详细信息

数据帧源将数据帧x的每一行解释为一个文档.第一列必须命名为"doc_id"，并且每个文档均包含唯一的字符串标识符.第二列必须命名为文本"，并包含代表文档内容的"UTF-8"编码字符串.可选的其他列用作文档级元数据.

A data frame source interprets each row of the data frame x as a document. The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text" and contain a "UTF-8" encoded string representing the document's content. Optional additional columns are used as document level metadata.

我用以下代码解决了它:

I solved it with the following code:

df_cmp<- read.csv("test_file.csv",stringsAsFactors = F)

df_title <- data.frame(doc_id=row.names(df_cmp),
                       text=df_cmp$English.title)

您可以尝试将列名称更改为doc_id和text.

You can try and change the column names to doc_id and text.

这篇关于在R中使用TM软件包的VCorpus时遇到错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中使用TM软件包的VCorpus时遇到错误 [英] Error faced while using TM package's VCorpus in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中使用TM软件包的VCorpus时遇到错误 [英] Error faced while using TM package&#39;s VCorpus in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

在R中使用TM软件包的VCorpus时遇到错误 [英] Error faced while using TM package's VCorpus in R

登录关闭