在 R 中将词干词转换为根词 [英] converting stemmed word to the root word in R

查看:22
本文介绍了在 R 中将词干词转换为根词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用 R 中的tm"包提取的单词列表.我可以在这一步之后以某种方式取回词根吗?提前致谢.

Hi I have a list of words which have been stemmed using the "tm" package in R. Can I get back the root word some how after this step. Thanks in Advance.

例如:活动 --> 活动

Ex : activiti --> activity

推荐答案

您可以使用 stemCompletion() 函数来实现此目的,但您可能需要先修剪词干.考虑以下几点:

You can use the stemCompletion() function to achieve this, but you may need to trim the stems first. Consider the following:

library(tm)

library(qdap) # providers the stemmer() function

active.text = "there are plenty of funny activities"

active.corp = Corpus(VectorSource(active.text))

(st.text = tolower(stemmer(active.text,warn=F))) 
# this is what the columns of your Term Document Matrix are going to look like
[1] "there"  "are"    "plenti" "of"     "funni"  "activ" 

st.text = gsub("[aeyuio]+$","",st.text) # removing vowels on the end of each word
stemCompletion(st.text,active.corp,"prevalent") # now it works
        ther           ar        plent           of         funn        activ 
     "there"        "are"     "plenty"         "of"      "funny" "activities" 

请记住,虽然词干会混淆某些词.例如,university"和universal"在词干化后都变成了univers",您无法正确恢复它.

Do keep in mind though that stemming confabulates certain words. For example "university" and "universal" both become "univers" after stemming and there is nothing you can do to restore it correctly.

希望这会有所帮助.

这篇关于在 R 中将词干词转换为根词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆