按R计算连续出现的字词 [英] Count most frequent word in row by R

查看:105
本文介绍了按R计算连续出现的字词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面显示了一个表格

   Name     Mon    Tue     Wed    Thu     Fri    Sat    Sun

1 John     Apple  Orange  Apple  Banana  Apple  Apple  Orange
2 Ricky    Banana Apple   Banana Banana  Banana Banana Apple
3 Alex     Apple  Orange  Orange Apple   Apple  Orange Orange
4 Robbin   Apple  Apple   Apple  Apple   Apple  Banana Banana
5 Sunny    Banana Banana  Apple  Apple   Apple  Banana Banana

因此,我想计算每个人最常使用的水果,然后将这些值添加到新列中.

So , I want to count the most frequent Fruit for each person and add those value in new column.

例如.

   Name     Mon    Tue     Wed    Thu     Fri    Sat    Sun      Max_Acc  Count

1 John     Apple  Orange  Apple  Banana  Apple  Apple  Orange     Apple       4
2 Ricky    Banana Apple   Banana Banana  Banana Banana Apple      Banana      5
3 Alex     Apple  Orange  Orange Apple   Apple  Orange Orange     Orange      4
4 Robbin   Apple  Apple   Apple  Apple   Apple  Banana Banana     Apple       5
5 Sunny    Banana Banana  Apple  Apple   Apple  Banana Banana     Banana      4

我在查找行时遇到问题.我可以使用table()函数在列"中找到频率".

I am facing problem in finding rows. I can find Frequency in column by using table() function.

>table(df$Mon)

 Apple  Banana
  3      2

但是在这里我想要新列中最常出现的水果的名称.

But here i want name of most frequent fruit in new column.

推荐答案

如果我们需要与max"Count"相对应的"Count"和"Names",则可以遍历数据集的行(使用MARGIN = 1),使用table来获取频率,从中提取最大值,并提取与该最大值相对应的names,并从原始数据集中提取rbind它和cbind.

If we need the "Count" and "Names" corresponding to the max "Count", we loop through the rows of the dataset (using apply with MARGIN = 1), use table to get the frequency, extract the maximum value from it and the names corresponding to the maximum value, rbind it and cbind with the original dataset.

cbind(df1, do.call(rbind, apply(df1[-1], 1, function(x) {
              x1 <- table(x)
             data.frame(Count = max(x1), Names=names(x1)[which.max(x1)])})))

#    Name    Mon    Tue    Wed    Thu    Fri    Sat    Sun Count  Names
#1   John  Apple Orange  Apple Banana  Apple  Apple Orange     4  Apple
#2  Ricky Banana  Apple Banana Banana Banana Banana  Apple     5 Banana
#3   Alex  Apple Orange Orange  Apple  Apple Orange Orange     4 Orange
#4 Robbin  Apple  Apple  Apple  Apple  Apple Banana Banana     5  Apple
#5  Sunny Banana Banana  Apple  Apple  Apple Banana Banana     4 Banana


或者我们可以使用data.table

library(data.table)
setDT(df1)[, c("Names", "Count") := {tbl <- table(unlist(.SD))
                    .(names(tbl)[which.max(tbl)], max(tbl))}, by = Name]

这篇关于按R计算连续出现的字词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆