如何获得R中句子中单词的出现频率? [英] How to get frequency of word in a sentence in R?
问题描述
我有一个输入文件,其中有一个段落.我需要找到该段中特定单词的出现频率.
I have one input file which has one paragraph. I need to find the frequency of particular word in that paragraph.
cat文件:
Text Index
train is good 1
let the train come 5
train is best 3
i m great 3
what is best 2
代码:
input<-read.table("file",sep="\t",header=TRUE)
paragraph1<-input[1][1]
word<-"train"
我需要在第1段中找到火车"一词的出现频率.如何使用R获取它?
I need to find frequency of word "train" in paragraph1. How can i get it using R?
推荐答案
如果您提供了更多信息,我可能会提供更多信息.使用qdap
,您可以:
If you gave a little more info I could probably provide more info in return. Using qdap
you could:
library(qdap)
dat <- readLines(n=5)
train is good 1
let the train come 5
train is best 3
i m great 3
what is best 2
dat <- do.call(rbind.data.frame, strsplit(dat, " +"))
colnames(dat) <- c("Text", "Index")
termco(dat$Text, , " train ")
## > termco(dat$Text, , " train ")
## all word.count train
## 1 all 16 3(18.75%)
您可以使用termco
一次完成所有段落.有关termco
的更多信息,请参见此链接.
You could probably do all the paragraphs at once with termco
. For more on termco
see this link.
这很多取决于段落之间的分隔,如何阅读,如何缩进等等.
Alot of this depends on what's separating paragraphs, how you're reading it in, how things are indented etc.
发布者发现以下有用:
length(gregexpr("the", "the dog ate the word the", fixed = TRUE)[[1]])
这篇关于如何获得R中句子中单词的出现频率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!