如何在R中添加总计以及group_by统计信息 [英] How to add totals as well as group_by statistics in R

查看:690
本文介绍了如何在R中添加总计以及group_by统计信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用summarisegroup_by计算任何统计量时,我们仅按类别获得摘要统计量,而不是所有总体的值(总计).如何两者兼得?

When computing any statistic using summarise and group_by we only get the summary statistic per-category, and not the value for all the population (Total). How to get both?

我正在寻找干净简洁的东西.到目前为止,我只能想到:

I am looking for something clean and short. Until now I can only think of:

bind_rows( 
  iris %>% group_by(Species) %>% summarise(
    "Mean" = mean(Sepal.Width), 
    "Median" = median(Sepal.Width), 
    "sd" = sd(Sepal.Width), 
    "p10" = quantile(Sepal.Width, probs = 0.1))
  , 
  iris %>% summarise(
    "Mean" = mean(Sepal.Width), 
    "Median" = median(Sepal.Width), 
    "sd" = sd(Sepal.Width), 
    "p10" = quantile(Sepal.Width, probs = 0.1)) %>% 
  mutate(Species = "Total")
  )

但是我想要更紧凑的东西.特别是,我不想键入两次代码(用于摘要),每个组一次,总计一次.

But I would like something more compact. In particular, I don't want to type the code (for summarize) twice, once for each group and once for the total.

推荐答案

如果解开要尝试的操作,则可以简化它:您拥有包含多个种类的iris数据,并且希望将其与所有物种的数据.您不需要在绑定前 计算这些摘要统计信息.而是将iris与已设置为Species = "Total"iris版本绑定,然后进行分组和汇总.

You can simplify it if you untangle what you're trying to do: you have iris data that has several species, and you want that summarized along with data for all species. You don't need to calculate those summary stats before you can bind. Instead, bind iris with a version of iris that's been set to Species = "Total", then group and summarize.

library(tidyverse)

bind_rows(
  iris,
  iris %>% mutate(Species = "Total")
) %>%
  group_by(Species) %>%
  summarise(Mean = mean(Sepal.Width),
            Median = median(Sepal.Width),
            sd = sd(Sepal.Width),
            p10 = quantile(Sepal.Width, probs = 0.1))
#> # A tibble: 4 x 5
#>   Species     Mean Median    sd   p10
#>   <chr>      <dbl>  <dbl> <dbl> <dbl>
#> 1 setosa      3.43    3.4 0.379  3   
#> 2 Total       3.06    3   0.436  2.5 
#> 3 versicolor  2.77    2.8 0.314  2.3 
#> 4 virginica   2.97    3   0.322  2.59

我喜欢上面的评论中的警告,尽管我必须进行这种计算才能工作,以至于我在个人包装中也具有类似的速记功能.对于标准差之类的东西,可能没有什么意义,但是我需要做很多事来增加人口统计总数等.(如果有用,该函数为

I like the caution in the comments above, though I have to do this sort of calculation for work enough that I have a similar shorthand function in a personal package. It perhaps makes less sense for things like standard deviations, but it's something I need to do a lot for adding up totals of demographic groups, etc. (If it's useful, that function is here).

这篇关于如何在R中添加总计以及group_by统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆