百分位数分组表 [英] Grouped table of percentiles

查看:159
本文介绍了百分位数分组表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要计算哪个值代表组中5%,34%,50%,67%和95%的百分数(在单独的列中)。预期产出为

I need to calculate which value represents the 5%, 34%, 50%, 67% and 95% percentile within the group (in separate columns). An expected output would be

    5%   34%  50%  67% 95%
A   4     6    8    12  30
B   1     2    3    4    10

每个组的整数值。

以下代码显示了我到目前为止(但使用生成的数据):

The code below shows what I have so far (but using generated data):

library(dplyr)
library(tidyr)
data.frame(group=sample(LETTERS[1:5],100,TRUE),values=rnorm(100)) %>%
      group_by(group) %>%
      mutate(perc_int=findInterval(values, 
                    quantile(values, probs=c(0.05,0.34,0.5,0.67,0.95)))) %>%
      pivot_wider(names_from = perc_int,values_from = values)

使用此示例得到六个列,我不确定为什么。

I get six colums using this example, and I am not sure why.

此外,这些列还填充了一个向量,而不是单个值。如何仅获取代表值向量中百分位数的单个值?

Also, the columns are filled with a vector and not the single value. How do I get just a single value representing the percentile in the value vector?

推荐答案

您可以获取分位数数据,然后使用 unnest_wider 具有单独的列。

You could get the quantile data in a list and then use unnest_wider to have separate columns.

library(dplyr)
set.seed(123)

data.frame(group=sample(LETTERS[1:5],100,TRUE),values=rnorm(100)) %>%
   group_by(group) %>%
   summarise(perc_int= list(quantile(values, probs=c(0.05,0.34,0.5,0.67,0.95)))) %>%
   tidyr::unnest_wider(perc_int)

# A tibble: 5 x 6
#  group   `5%`  `34%`   `50%` `67%` `95%`
#  <fct>  <dbl>  <dbl>   <dbl> <dbl> <dbl>
#1  A     -2.40  -0.580 -0.0887 0.371  1.38
#2  B     -1.83  -0.200  0.0848 0.546  1.78
#3  C     -0.947 -0.148  0.184  0.789  1.81
#4  D     -0.992 -0.275 -0.0193 0.274  1.82
#5  E     -1.65  -0.457 -0.0422 0.540  1.66

这篇关于百分位数分组表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆