按行查找最频繁的值 [英] Find the most frequent value by row
问题描述
我的问题如下:
我有一个包含多个因子变量的数据集,这些变量具有相同的类别。我需要找到类别,该类别最常出现在每一行中。如果是平局,可以选择一个任意值,尽管如果我可以对其进行更多控制,那将是一个很好的选择。
I have a data set containing several factor variables, which have the same categories. I need to find the category, which occurs most frequently for each row. In case of ties an arbitrary value can be chosen, although it would be great if I can have more control over it.
我的数据集包含一百多个因素。但是,该结构是这样的:
My data set contains over a hundred factors. However, the structure is something like that:
df = data.frame(id = 1:3
var1 = c("red","yellow","green")
var2 = c("red","yellow","green")
var3 = c("yellow","orange","green")
var4 = c("orange","green","yellow"))
df
# id var1 var2 var3 var4
# 1 1 red red yellow orange
# 2 2 yellow yellow orange green
# 3 3 green green green yellow
解决方案应该是数据框中的变量,例如var5,其中包含每一行的最频繁类别。它可以是一个因子或数值向量(以防万一需要先将数据转换为数值向量)
The solution should be a variable within the data frame, for example var5, which contains the most frequent category for each row. It can be a factor or a numeric vector (in case the data need to be converted first to numeric vectors)
在这种情况下,我想使用以下解决方案:
In this case, I would like to have this solution:
df$var5
# [1] "red" "yellow" "green"
任何建议将不胜感激!
Any advice will be much appreciated! Thanks in advance!
推荐答案
类似的东西:
apply(df,1,function(x) names(which.max(table(x))))
[1] "red" "yellow" "green"
如果有平局,则max为第一个最大值。从
which.max帮助页面:
In case there is a tie, which.max takes the first max value. From the which.max help page :
确定位置,即(第一个)
的索引数值向量的最小值或最大值。
Determines the location, i.e., index of the (first) minimum or maximum of a numeric vector.
Ex:
var4 <- c("yellow","green","yellow")
df <- data.frame(cbind(id, var1, var2, var3, var4))
> df
id var1 var2 var3 var4
1 1 red red yellow yellow
2 2 yellow yellow orange green
3 3 green green green yellow
apply(df,1,function(x) names(which.max(table(x))))
[1] "red" "yellow" "green"
这篇关于按行查找最频繁的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!