如何获得R中句子中单词的出现频率? [英] How to get frequency of word in a sentence in R?

查看:57
本文介绍了如何获得R中句子中单词的出现频率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个输入文件,其中有一个段落.我需要找到该段中特定单词的出现频率.

I have one input file which has one paragraph. I need to find the frequency of particular word in that paragraph.

cat文件:

Text    Index
train is good   1
let the train come      5
train is best   3
i m great       3
what is best    2

代码:

 input<-read.table("file",sep="\t",header=TRUE)
 paragraph1<-input[1][1]
 word<-"train"

我需要在第1段中找到火车"一词的出现频率.如何使用R获取它?

I need to find frequency of word "train" in paragraph1. How can i get it using R?

推荐答案

如果您提供了更多信息,我可能会提供更多信息.使用qdap,您可以:

If you gave a little more info I could probably provide more info in return. Using qdap you could:

library(qdap)

dat <- readLines(n=5)
train is good   1
let the train come      5
train is best   3
i m great       3
what is best    2

dat <- do.call(rbind.data.frame, strsplit(dat, "   +"))

colnames(dat) <- c("Text", "Index")

termco(dat$Text, , " train ")

## > termco(dat$Text, , " train ")
##   all word.count     train
## 1 all         16 3(18.75%)

您可以使用termco一次完成所有段落.有关termco的更多信息,请参见此链接.

You could probably do all the paragraphs at once with termco. For more on termco see this link.

这很多取决于段落之间的分隔,如何阅读,如何缩进等等.

Alot of this depends on what's separating paragraphs, how you're reading it in, how things are indented etc.

发布者发现以下有用:

length(gregexpr("the", "the dog ate the word the", fixed = TRUE)[[1]])

这篇关于如何获得R中句子中单词的出现频率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆