根据词典数据框替换语料库中的单词 [英] Replace words in corpus according to dictionary data frame

查看：73 发布时间：2020/5/18 0:40:30 r nlp tm

本文介绍了根据词典数据框替换语料库中的单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有兴趣根据由两列数据帧组成的字典替换tm语料库对象中的所有单词，其中第一列是要匹配的单词，第二列是替换单词.

I am interested in replacing all words in a tm Corpus object according to a dictionary made of a two columns data frame, where the first column is the word to be matched and the second column is the replacement word.

我对translate函数感到困惑.我看到了这个答案，但是我无法将其转换为要传递给tm_map的函数.

I am stuck with the translate function. I saw this answer but I can't transform it in a function to be passed to tm_map.

请考虑以下MWE

library(tm)

docs <- c("first text", "second text")
corp <- Corpus(VectorSource(docs))

dictionary <- data.frame(word = c('first', 'second', 'text'),
                      translation = c('primo', 'secondo', 'testo'))

translate <- function(text, dictionary) {
  # Would like to replace each word of text with corresponding word in dictionary
}

corp_translated <- tm_map (corp, translate)

inspect(corp_translated)

# Expected result

# A corpus with 2 text documents
#
# The metadata consists of 2 tag-value pairs and a data frame
# Available tags are:
#   create_date creator 
# Available variables in the data frame are:
#   MetaID 

# [[1]]
# primo testo

# [[2]]
# secondo testo

推荐答案

我建议不将data.frame用于字典，因为R中的基本对象是矢量默认情况下是字典.

I would suggest not using a data.frame for a dictionary, since the basic object in R, a vector, is a dictionary by default.

      dict  <- c('primo', 'secondo', 'testo')
names(dict) <- c('first', 'second', 'text')

然后转到"tanslate" x，其中x可能是"second"，只需使用:

Then to "tanslate" x, where x might be "second", you simply use:

   dict[[x]]

您甚至不需要包装器功能.

You dont even need a wrapper function.

如果您想以相反的方向进行翻译，请使用

If you want to translate in the opposite direction, use

   name(dict)[names(dict) %in% x]

或者您可以翻开字典

         dict.flip  <- names(dict)
   names(dict.flip) <- dict

这篇关于根据词典数据框替换语料库中的单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据词典数据框替换语料库中的单词 [英] Replace words in corpus according to dictionary data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据词典数据框替换语料库中的单词 [英] Replace words in corpus according to dictionary data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭