R文本挖掘 - 处理复数 [英] R text mining - dealing with plurals

查看：33 发布时间：2021/9/6 19:42:03 r text-mining

本文介绍了R文本挖掘 - 处理复数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在 R 中学习文本挖掘，并且取得了相当大的成功.但我被困在如何处理复数.即我希望nation"和nations"被算作同一个词，理想情况下dictionary"和dictionaries"被算作同一个词.

I'm learning text mining in R and have had pretty good success. But I am stuck on how to deal with plurals. i.e. I want "nation" and "nations" to be counted as the same word and ideally "dictionary" and "dictionaries" to be counted as the same word.

x <- '"nation" and "nations" to be counted as the same word and ideally "dictionary" and "dictionaries" to be counted as the same word.'

推荐答案

一种可能的解决方案.这里我使用 pacman 包使解决方案自包含:

One possible solution. Here I use the pacman package to make the solution self contained:

if (!require("pacman")) install.packages("pacman"); library(pacman)
p_load_gh('hrbrmstr/pluralize')
p_load(quanteda)

x <- '"nation" and "nations" to be counted as the same word and ideally "dictionary" and "dictionaries"'
singularize(unlist(tokenize(x)))

##  [1] "\""         "nation"     "\""         "and"        "\""         "nation"     "\""        
##  [8] "to"         "be"         "counted"    "a"          "the"        "same"       "word"      
## [15] "and"        "ideally"    "\""         "dictionary" "\""         "and"        "\""        
## [22] "dictionary" "\""

这篇关于R文本挖掘 - 处理复数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R文本挖掘 - 处理复数 [英] R text mining - dealing with plurals

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R文本挖掘 - 处理复数 [英] R text mining - dealing with plurals

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭