检查英语词典中是否存在单词 r [英] checking if word exist in english dictionary r

查看:33
本文介绍了检查英语词典中是否存在单词 r的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对多个 resume 执行一些文本分析以使用 wordcloud 包和 tm 生成 wordcloud> 用于在 R 中预处理文档语料库的包.

I'm performing some text analysis on mutliple resume to generate a wordcloud using wordcloud package along with tm package for preprocessing the corpus of document in R.

我面临的问题是:

  1. 检查语料库中的单词是否具有某种含义即.它属于英语词典.

  1. Checking whether the word in corpus have some meaning ie. it belongs to english dictionary.

如何一起挖掘/处理多个简历.

How to mine / process multiple resumes together.

检查技术术语,如 r、java、eclipse 等.

Checking for tech terms like r,java,eclipse etc.

感谢您的帮助.

推荐答案

我之前遇到过一些问题,所以分享您的问题的解决方案:

I've faced some issues before, so sharing solutions to your problems :

1. 有一个包 qdapDictionaries,它是一个字典和单词列表的集合,用于qdap"包.

1. There is a package qdapDictionaries which is a collection of dictionaries and word lists for use with the 'qdap' package.

library(qdapDictionaries)

#create custom function
is.word  <- function(x) x %in% GradyAugmented # or use any dataset from package

#use this function to filter words, df = dataframe from corpus
df <- df[which(is.word(df$terms)),]

2. 使用 VCorpus(DirSource(...)) 从包含所有简历的目录中创建您的语料库

2. Using VCorpus(DirSource(...)) to create your corpus from directory containing all resumes

resumeDir <- "path/all_resumes/"
myCorpus <- VCorpus(DirSource(resumeDir))

3. 创建包含 tech 术语的自定义词典文件,例如 my_dict.csv.

3. Create your custom dictionary file like my_dict.csv containing tech terms.

#read custom dictionary
tech_dict <- read.csv("path/to/my_dict.csv", stringsAsFactors = FALSE)
#create tech function
is.tech <- function(x) x %in% tech_dict
#filter
tech_df <- df[which(is.tech(df$terms)),]

希望这会有所帮助.

这篇关于检查英语词典中是否存在单词 r的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆