按行查找最频繁的值 [英] Find the most frequent value by row

查看:99
本文介绍了按行查找最频繁的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题如下:

我有一个包含多个因子变量的数据集,这些变量具有相同的类别。我需要找到类别,该类别最常出现在每一行中。如果是平局,可以选择一个任意值,尽管如果我可以对其进行更多控制,那将是一个很好的选择。

I have a data set containing several factor variables, which have the same categories. I need to find the category, which occurs most frequently for each row. In case of ties an arbitrary value can be chosen, although it would be great if I can have more control over it.

我的数据集包含一百多个因素。但是,该结构是这样的:

My data set contains over a hundred factors. However, the structure is something like that:

df = data.frame(id = 1:3
                var1 = c("red","yellow","green")
                var2 = c("red","yellow","green")
                var3 = c("yellow","orange","green")
                var4 = c("orange","green","yellow"))

df
#   id   var1   var2   var3   var4
# 1  1    red    red yellow orange
# 2  2 yellow yellow orange  green
# 3  3  green  green  green yellow

解决方案应该是数据框中的变量,例如var5,其中包含每一行的最频繁类别。它可以是一个因子或数值向量(以防万一需要先将数据转换为数值向量)

The solution should be a variable within the data frame, for example var5, which contains the most frequent category for each row. It can be a factor or a numeric vector (in case the data need to be converted first to numeric vectors)

在这种情况下,我想使用以下解决方案:

In this case, I would like to have this solution:

df$var5
# [1] "red"    "yellow" "green" 

任何建议将不胜感激!

Any advice will be much appreciated! Thanks in advance!

推荐答案

类似的东西:

apply(df,1,function(x) names(which.max(table(x))))
[1] "red"    "yellow" "green" 

如果有平局,则max为第一个最大值。从
which.max帮助页面:

In case there is a tie, which.max takes the first max value. From the which.max help page :


确定位置,即(第一个)
的索引数值向量的最小值或最大值。

Determines the location, i.e., index of the (first) minimum or maximum of a numeric vector.

Ex:

var4 <- c("yellow","green","yellow")
df <- data.frame(cbind(id, var1, var2, var3, var4))

> df
  id   var1   var2   var3   var4
1  1    red    red yellow yellow
2  2 yellow yellow orange  green
3  3  green  green  green yellow

apply(df,1,function(x) names(which.max(table(x))))
[1] "red"    "yellow" "green" 

这篇关于按行查找最频繁的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆