获得每行最频繁的值并说明联系 [英] Get the most frequent value per row and account for ties

查看:41
本文介绍了获得每行最频繁的值并说明联系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

样本数据:

df <- data.frame("ID" = 1:6, 
                 "Group1" = c("A", NA, "C", NA, "E", "C"), 
                 "Group2" = c("E", "C", "C", NA, "E", "E"),
                 "Group3" = c("A", "A", NA, NA, "C", NA),
                 "Group4" = c(NA, "C", NA, "D", "C", NA),
                 "Group5" = c("A", "D", NA, NA, NA, NA))

在每一行中,我要计算每个值的数量并将最频繁出现的值存储在新变量 New.Group 中.如果是平局,则应选择该行中的第一个值.适用于示例的逻辑:

In each row, I want to count the number of each value and store the most frequent value in a new variable, New.Group. In case of ties, the first value in the row should be selected. The logic applied to the example:

New.Group 的行1取值 A ,因为它是该行中最常见的值,而忽略了 NA s.

Row 1 of New.Group takes value A because it is most frequent value in the row, ignoring NAs.

行2的值是 C ,因为它也是最常见的值.

Row 2 takes value C because it is also the most frequent value.

第3行与第2行相同.

第4行采用值 D ,因为它是该行中唯一的值.

Row 4 takes value D because it's the only value in the row.

在第5行中, E C 的计数均为2,但选择了 E 是因为在 C 之前遇到了它.代码>.

In Row 5 both E and C has count 2, but E is selected because it is encountered before C in the row.

第6行与第5行类似, C E 的计数均为1,但是选择了 C 是因为在<行中的code> E .

Row 6, similar to row 5, both C and E has count 1, but C is selected because it is encountered before E in the row.

所需的输出:

  ID Group1 Group2 Group3 Group4 Group5 New.Group
1  1      A      E      A   <NA>      A         A
2  2   <NA>      C      A      C      D         C
3  3      C      C   <NA>   <NA>   <NA>         C
4  4   <NA>   <NA>   <NA>      D   <NA>         D
5  5      E      E      C      C   <NA>         E
6  6      C      E   <NA>   <NA>   <NA>         C

推荐答案

我认为这可以满足您的需求.对于每一行,它创建每个字母的频率表,并选择最大的频率,同时保留列的顺序以保持联系.然后,它返回该表中第一列的名称.

I think this achieves what you're looking for. For each row, it creates a table of frequencies of each letter and chooses the largest, whilst preserving column order for ties. It then returns the name of the first column in this table.

感谢Henrik提出改进建议.

Thanks to Henrik for suggesting the improvement.

df$New.Group <- apply(df[-1], 1, function(x) {
names(which.max(table(factor(x, unique(x)))))
})

df
#>   ID Group1 Group2 Group3 Group4 Group5 New.Group
#> 1  1      A      E      A   <NA>      A         A
#> 2  2   <NA>      C      A      C      D         C
#> 3  3      C      C   <NA>   <NA>   <NA>         C
#> 4  4   <NA>   <NA>   <NA>      D   <NA>         D
#> 5  5      E      E      C      C   <NA>         E
#> 6  6      C      E   <NA>   <NA>   <NA>         C

这篇关于获得每行最频繁的值并说明联系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆