按R计算连续出现的字词 [英] Count most frequent word in row by R
问题描述
下面显示了一个表格
Name Mon Tue Wed Thu Fri Sat Sun
1 John Apple Orange Apple Banana Apple Apple Orange
2 Ricky Banana Apple Banana Banana Banana Banana Apple
3 Alex Apple Orange Orange Apple Apple Orange Orange
4 Robbin Apple Apple Apple Apple Apple Banana Banana
5 Sunny Banana Banana Apple Apple Apple Banana Banana
因此,我想计算每个人最常使用的水果,然后将这些值添加到新列中.
So , I want to count the most frequent Fruit for each person and add those value in new column.
例如.
Name Mon Tue Wed Thu Fri Sat Sun Max_Acc Count
1 John Apple Orange Apple Banana Apple Apple Orange Apple 4
2 Ricky Banana Apple Banana Banana Banana Banana Apple Banana 5
3 Alex Apple Orange Orange Apple Apple Orange Orange Orange 4
4 Robbin Apple Apple Apple Apple Apple Banana Banana Apple 5
5 Sunny Banana Banana Apple Apple Apple Banana Banana Banana 4
我在查找行时遇到问题.我可以使用table()
函数在列"中找到频率".
I am facing problem in finding rows. I can find Frequency in column by using table()
function.
>table(df$Mon)
Apple Banana
3 2
但是在这里我想要新列中最常出现的水果的名称.
But here i want name of most frequent fruit in new column.
推荐答案
如果我们需要与max
"Count"相对应的"Count"和"Names",则可以遍历数据集的行(使用MARGIN = 1
),使用table
来获取频率,从中提取最大值,并提取与该最大值相对应的names
,并从原始数据集中提取rbind
它和cbind
.
If we need the "Count" and "Names" corresponding to the max
"Count", we loop through the rows of the dataset (using apply
with MARGIN = 1
), use table
to get the frequency, extract the maximum value from it and the names
corresponding to the maximum value, rbind
it and cbind
with the original dataset.
cbind(df1, do.call(rbind, apply(df1[-1], 1, function(x) {
x1 <- table(x)
data.frame(Count = max(x1), Names=names(x1)[which.max(x1)])})))
# Name Mon Tue Wed Thu Fri Sat Sun Count Names
#1 John Apple Orange Apple Banana Apple Apple Orange 4 Apple
#2 Ricky Banana Apple Banana Banana Banana Banana Apple 5 Banana
#3 Alex Apple Orange Orange Apple Apple Orange Orange 4 Orange
#4 Robbin Apple Apple Apple Apple Apple Banana Banana 5 Apple
#5 Sunny Banana Banana Apple Apple Apple Banana Banana 4 Banana
或者我们可以使用data.table
library(data.table)
setDT(df1)[, c("Names", "Count") := {tbl <- table(unlist(.SD))
.(names(tbl)[which.max(tbl)], max(tbl))}, by = Name]
这篇关于按R计算连续出现的字词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!