dplyr - 多个变量的汇总表 [英] dplyr - summary table for multiple variables
本文介绍了dplyr - 多个变量的汇总表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何使用 dplyr
从多个变量创建简单的汇总统计?使用 summarise_each
函数似乎是可行的方法,但是,当将多个函数应用于多列时,结果是一个宽且难以阅读的数据框.
How to create simple summary statistics using dplyr
from multiple variables? Using the summarise_each
function seems to be the way to go, however, when applying multiple functions to multiple columns, the result is a wide, hard-to-read data frame.
推荐答案
使用 dplyr
结合 tidyr
来重塑最终结果.
Use dplyr
in combination with tidyr
to reshape the end result.
library(dplyr)
library(tidyr)
df <- tbl_df(mtcars)
df.sum <- df %>%
select(mpg, cyl, vs, am, gear, carb) %>% # select variables to summarise
summarise_each(funs(min = min,
q25 = quantile(., 0.25),
median = median,
q75 = quantile(., 0.75),
max = max,
mean = mean,
sd = sd))
# the result is a wide data frame
> dim(df.sum)
[1] 1 42
# reshape it using tidyr functions
df.stats.tidy <- df.sum %>% gather(stat, val) %>%
separate(stat, into = c("var", "stat"), sep = "_") %>%
spread(stat, val) %>%
select(var, min, q25, median, q75, max, mean, sd) # reorder columns
> print(df.stats.tidy)
var min q25 median q75 max mean sd
1 am 0.0 0.000 0.0 1.0 1.0 0.40625 0.4989909
2 carb 1.0 2.000 2.0 4.0 8.0 2.81250 1.6152000
3 cyl 4.0 4.000 6.0 8.0 8.0 6.18750 1.7859216
4 gear 3.0 3.000 4.0 4.0 5.0 3.68750 0.7378041
5 mpg 10.4 15.425 19.2 22.8 33.9 20.09062 6.0269481
6 vs 0.0 0.000 0.0 1.0 1.0 0.43750 0.5040161
这篇关于dplyr - 多个变量的汇总表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文