应用tm_map时tm丢失元数据 [英] tm loses the metadata when applying tm_map

查看：88 发布时间：2020/5/9 1:52:44 r metadata tm

本文介绍了应用tm_map时tm丢失元数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的tm r库有一个(小)问题. 说我有一个语料库:

I have a (small) problem with the tm r library. say I have a corpus:

# boilerplate
bcorp <- c("one","two","three","four","five")
myCorpus <- Corpus(VectorSource(bcorp), list(lanuage = "en_US"))
tdm <- TermDocumentMatrix(myCorpus)
Docs(tdm)

结果:

[1] "1" "2" "3" "4" "5"

这有效.但是当我尝试使用转换tm_map()时:

This works. But when I try to use a transformation tm_map():

# this does not work
myCorpus <- Corpus(VectorSource(bcorp), list(lanuage = "en_US"))
myCorpus <- tm_map(myCorpus, tolower)
tdm <- TermDocumentMatrix(myCorpus)

给予

Error: inherits(doc, "TextDocument") is not TRUE

在这种情况下提出的解决方案是将其转换为PlainTextDocument.

The solution proposed in this case was to transform to PlainTextDocument.

# this works but erase the metadata
myCorpus <- Corpus(VectorSource(bcorp), list(lanuage = "en_US"))
myCorpus <- tm_map(myCorpus, tolower)
myCorpus <- tm_map(myCorpus, PlainTextDocument)
tdm <- TermDocumentMatrix(myCorpus)
Docs(tdm)

结果:

[1] "character(0)" "character(0)" "character(0)" "character(0)" "character(0)"

现在可以使用，但是会删除所有元数据(在这种情况下为文档名称).有没有一种方法可以保存元数据，或者先保存然后再还原它们?

Now it works, but erase all the metadata (in this case the doc names). There is a way to mantain the metadata, or to save and then restore them?

推荐答案

我找到了.

该行:

myCorpus <- tm_map(myCorpus, PlainTextDocument)

解决了问题，但清除了元数据.

solves the problem but erase the metadata.

我找到了此答案，它解释了使用tm_map()的更好方法.我只需要替换:

I found this answer that explain a better way to use tm_map(). I just have to substitute:

myCorpus <- tm_map(myCorpus, tolower)

具有:

myCorpus <- tm_map(myCorpus, content_transformer(tolower))

所有作品！

这篇关于应用tm_map时tm丢失元数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

应用tm_map时tm丢失元数据 [英] tm loses the metadata when applying tm_map

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

应用tm_map时tm丢失元数据 [英] tm loses the metadata when applying tm_map

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭