使用dplyr窗口函数来计算百分位数 [英] Using dplyr window functions to calculate percentiles

查看：180 发布时间：2017/7/13 20:57:26 r dplyr tidyr

本文介绍了使用dplyr窗口函数来计算百分位数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个工作的解决方案，但正在寻找一个更干净，更可读的解决方案，可能利用一些较新的dplyr窗口函数。

使用mtcars数据集如果我想看看第25，第50，第75百分位数和每加仑英里（mpg）的平均和数量乘以气缸数（cyl），我使用以下代码：

 库（dplyr）
库（tidyr）
 
＃加载数据
数据（mtcars ）
 
＃计算中使用的百分位数
p<  -  c（.25，.5，.75）
 
＃old dplyr solution 
 mtcars ％>％group_by（cyl）％>％
 do（data.frame（p = p，stats = quantile（。$ mpg，probs = p））
n =长度（。$ mpg） ，avg = mean（。$ mpg）））％>％
 spread（p，stats）％>％
 select（1，4：6，3，2）
 
＃注意：选择和传播语句只是将数据写入
＃我想要看到的格式，但不是关键的

使用一些简要功能（n_tiles，percent_rank等），我可以使用dplyr更干净地执行此操作吗？干脆地，我的意思是没有做声明。

谢谢

解决方案>

这是一个避免 do 的 dplyr 方法，但需要单独调用为每个分位数值。

  mtcars％>％group_by（cyl）％>％
总结（`25％`= mpg，probs = 0.25），
`50％`= quantile（mpg，probs = 0.5），
`75％`=位数（mpg，probs = 0.75），
 avg = （mpg），
n = n（））
 
 cyl 25％50％75％平均值
 1 4 22.80 26.0 30.40 26.66364 11 
 2 6 18.65 19.7 21.00 19.74286 7 
 3 8 14.40 15.2 16.25 15.10000 14

如果 summary 可以通过单次调用 quantile 返回多个值，但这似乎是 dplyr 开发中的.com / hadley / dplyr / issues / 154>开放问题。

I have a working solution but am looking for a cleaner, more readable solution that perhaps takes advantage of some of the newer dplyr window functions.

Using the mtcars dataset, if I want to look at the 25th, 50th, 75th percentiles and the mean and count of miles per gallon ("mpg") by the number of cylinders ("cyl"), I use the following code:

library(dplyr)
library(tidyr)

# load data
data("mtcars")

# Percentiles used in calculation
p <- c(.25,.5,.75)

# old dplyr solution 
mtcars %>% group_by(cyl) %>% 
  do(data.frame(p=p, stats=quantile(.$mpg, probs=p), 
                n = length(.$mpg), avg = mean(.$mpg))) %>%
  spread(p, stats) %>%
  select(1, 4:6, 3, 2)

# note: the select and spread statements are just to get the data into
#       the format in which I'd like to see it, but are not critical

Is there a way I can do this more cleanly with dplyr using some of the summary functions (n_tiles, percent_rank, etc.)? By cleanly, I mean without the "do" statement.

Thank you

解决方案

Here's a dplyr approach that avoids do but requires a separate call to quantile for each quantile value.

mtcars %>% group_by(cyl) %>%
  summarise(`25%`=quantile(mpg, probs=0.25),
            `50%`=quantile(mpg, probs=0.5),
            `75%`=quantile(mpg, probs=0.75),
            avg=mean(mpg),
            n=n())

  cyl   25%  50%   75%      avg  n
1   4 22.80 26.0 30.40 26.66364 11
2   6 18.65 19.7 21.00 19.74286  7
3   8 14.40 15.2 16.25 15.10000 14

It would be better if summarise could return multiple values with a single call to quantile, but this appears to be an open issue in dplyr development.

这篇关于使用dplyr窗口函数来计算百分位数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用dplyr窗口函数来计算百分位数 [英] Using dplyr window functions to calculate percentiles

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

使用dplyr窗口函数来计算百分位数 [英] Using dplyr window functions to calculate percentiles

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭