计算一列中词典词的频率并生成新的"dictfreq".柱子 [英] Count frequency of dictionary words within a column and generate new "dictfreq" column

查看：53 发布时间：2021/4/30 20:05:49 r dictionary word-frequency

本文介绍了计算一列中词典词的频率并生成新的"dictfreq".柱子的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

似乎像一个简单的命令，但是我似乎找不到在R中生成该命令的好方法.基本上，我只想计算另一个数据帧的列wordsgov中字典中的每个单词的频率:

Seems like a simple command, but i cannot seem to find a good way generate this in R. Basically, I just want to count the frequency of each word in a dictionary, dict, within another dataframe's column, wordsgov:

dict = "apple", "pineapple","pear"
df$wordsgov = "i hate apple", "i hate apple", "i love pear", "i don't like pear", "pear is okay", "i eat pineapple sometimes"

期望的输出:新的频率排名，根据df $ wordsgov中的频率显示字典中的所有单词

desired output: new frequency ranking, showing all words in dict according to their frequency within df$wordsgov

dict    freq_gov
"pear" : 3
"apple": 2
"pineapple: 1

我尝试了以下代码，但是它给了我dict单词在df $ wordgov的每一行中出现多少次的计数，这不是我想要的:

i tried the following code, but it has given me the count of how many times dict words appear in each row of df$wordgov, which is not what i want:

dictongov <- within(
  df,
  counts <- sapply(
    gregexpr(paste0(dict, collapse = "|"), wordsgov),
    function(x) sum(x > 0)
  )
)

我似乎无法弄清楚如何更改该函数，以使其在dict $ wordsgov上为我提供dict的每个单词的频率.我尝试了str_detect，但它也无法正常工作.任何帮助都将不胜感激！！！

i cannot seem to figure out how to change the function so that it gives me the frequency for each word of the dict on dict$wordsgov instead. i tried str_detect but it is also not working. any help at all would be really appreciated!!!

-我使用了以下方法，效果很好.

-- edit: i used the following, which worked well.

dictfreq <- df %>% mutate(dict = str_c(str_extract(wordsgov, str_c(dict, collapse = '|')), ':')) %>% 
                   count(dict, name = 'freq_gov') %>% arrange(desc(freq_gov))

但是，它取出了所有频率为0的单词.有什么办法可以保持频率为0的单词?我尝试了".drop = FALSE"，但在此代码中似乎无效.任何帮助将非常感激.谢谢！

however, it took out all the words that had frequency of 0. is there any way to keep the words with frequency of 0? i tried ".drop=FALSE", but it does not seem to be working within this code. any help would be really appreciated. thanks!

数据

v1 <- c("i hate apple", "i hate apple", "i love pear", "i don't like pear", 
       "pear is okay", "i eat pineapple sometimes")

v2 <- c("apple", "pineapple", "pear")

这篇关于计算一列中词典词的频率并生成新的"dictfreq".柱子的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算一列中词典词的频率并生成新的"dictfreq".柱子 [英] Count frequency of dictionary words within a column and generate new "dictfreq" column

问题描述

推荐答案

数据

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算一列中词典词的频率并生成新的"dictfreq".柱子 [英] Count frequency of dictionary words within a column and generate new &quot;dictfreq&quot; column

问题描述

推荐答案

数据

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

计算一列中词典词的频率并生成新的"dictfreq".柱子 [英] Count frequency of dictionary words within a column and generate new "dictfreq" column

登录关闭