R中文本数据中两对组合出现的频率 [英] Frequency of occurrence of two-pair combinations in text data in R

查看：94 发布时间：2020/10/5 22:42:12 r text combinations frequency

本文介绍了R中文本数据中两对组合出现的频率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含多个字符串（文本）变量的文件，每个受访者为每个变量写了一两个句子。我希望能够找到每个单词组合的频率（即功能与性能出现的频率）。
到目前为止，我的代码是：

I have a file with several string (text) variables where each respondent has written a sentence or two for each variable. I want to be able to find the frequency of each combination of words (i.e. how often "capability" occurs with "performance"). My code so far goes:

#Setting up the data file 
data.text <- scan("C:/temp/tester.csv", what="char", sep="\n")

#Change everything to lower text
data.text <- tolower(data.text)

#Split the strings into separate words
data.words.list <- strsplit(data.text, "\\W+", perl=TRUE)
data.words.vector <- unlist(data.words.list)

#List each word and frequency
data.freq.list <- table(data.words.vector)

这给了我每个单词的列表以及在单词中出现的频率字符串变量。现在，我想查看每2个单词组合的出现频率。

This gives me a list of each word and how often it appears in the string variables. Now I want to see the frequency of every 2 word combination. Is this possible?

谢谢！

字符串数据的示例：

ID   Reason_for_Dissatisfaction    Reason_for_Likelihood_to_Switch
1    "not happy with the service"  "better value at other place"
2    "poor customer service"       "tired of same old thing"
3    "they are overchanging me"    "bad service"

推荐答案

我不确定这是否是yu的意思，但是您可以使用将每两个单词粘贴在一起，而不是分开每两个单词边界（我发现尝试正则表达式很麻烦）可靠的头和尾巴滑招...

I'm not sure if this is what yu mean, but rather than splitting on every two word boundaires (which I found a pain to try and regex) you could paste every two words together using the trusty head and tails slip trick...

#  How I read your data
df <- read.table( text = 'ID   Reason_for_Dissatisfaction    Reason_for_Likelihood_to_Switch
1    "not happy with the service"  "better value at other place"
2    "poor customer service"       "tired of same old thing"
3    "they are overchanging me"    "bad service"
' , h = TRUE , stringsAsFactors = FALSE )


#  Split to words
wlist <- sapply( df[,-1] , strsplit , split = "\\W+", perl=TRUE)

#  Paste word pairs together
outl <- sapply( wlist , function(x) paste( head(x,-1) , tail(x,-1) , sep = " ") )

#  Table as per usual
table(unlist( outl ) )
are overchanging         at other      bad service     better value customer service 
               1                1                1                1                1 
      happy with        not happy          of same        old thing      other place 
               1                1                1                1                1 
 overchanging me    poor customer         same old      the service         they are 
               1                1                1                1                1 
        tired of         value at         with the 
               1                1                1

这篇关于R中文本数据中两对组合出现的频率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R中文本数据中两对组合出现的频率 [英] Frequency of occurrence of two-pair combinations in text data in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R中文本数据中两对组合出现的频率 [英] Frequency of occurrence of two-pair combinations in text data in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭