使用Dplyr的“ group_by”和“摘要”和自定义函数来计算多个组的模式 [英] Using Dplyr "group_by" and "Summarise" and a Custom Function to Calculate the Mode of Several Groups

查看:97
本文介绍了使用Dplyr的“ group_by”和“摘要”和自定义函数来计算多个组的模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

显然dplyr的摘要功能不包含模式选项。基于下面的简单数据框示例,我想为每组类别确定模式或最经常重复的数字。因此,对于 A组,模式为22,对于 B,模式为12和14, C没有重复编号。

Apparently dplyr's summarise function doesn't include an option for "mode". Based on the simple data frame example below, I would like to determine the mode, or most frequently repeating number, for each group of "Category." So for group "A", the mode is 22, for "B", it's 12 and 14, and there is no repeating number for "C".

我在网上找到了一些函数示例,但是当组中没有重复的数字时,没有一个可以解决这种情况。是否需要自定义功能,或者某处有模式选项?我不想仅仅依靠其他专用程序包来实现其模式功能。最好使用基数R,dplyr,tidy等的组合来找到一种优雅而简单的解决方案。

I found some examples of functions online, but none addressed the situation when there are no repeating numbers in a group. Is there a need for a custom function, or is there a mode option somewhere? I don't want to rely on any other specialized packages just for their mode function. It would be nice to find an elegant and simple solutioin using a combination of base R, dplyr, tidy, etc.

如果使用自定义函数,则必须当没有重复数字时,以及当有多个相等的重复数字时,该选项将起作用。

If a custom function is used, it will have to work when there are no repeating numbers, as well as when there are more than one equally repeating number.

任何帮助将不胜感激!

Any help would be greatly appreciated! This seems like it should be an easy solutioin in R, so I was surprised to learn that there is no simple summarise_each(funs(mode)... option.

如果在R中似乎应该很容易将其溶解,因此令我惊讶的是,这里没有简单的summarise_each(funs(mode)...选项。

If a custom function is used, please break it down with explanations. I'm still relatively new to R functions.

Category<-c("A","B","B","C","A","A","A","B","C","B","C","C")
Number<-c(22,12,12,8,22,22,18,14,10,14,1,3)
DF<-data.frame(Category,Number)


推荐答案

我们可以使用

 Mode <- function(x) {
  ux <- unique(x)
  if(!anyDuplicated(x)){
      NA_character_ } else { 
     tbl <-   tabulate(match(x, ux))
     toString(ux[tbl==max(tbl)])
 }
}

DF %>%
   group_by(Category) %>%
   summarise(NumberMode = Mode(Number))
#  Category NumberMode
#    <fctr>      <chr>
#1        A         22
#2        B     12, 14
#3        C       <NA>

这篇关于使用Dplyr的“ group_by”和“摘要”和自定义函数来计算多个组的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆