使用列中的grouped_by变量汇总交叉表中的数据 [英] summarizing data in cross-table with grouped_by variable in columns

查看:138
本文介绍了使用列中的grouped_by变量汇总交叉表中的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对两个变量之间的数据进行汇总,并且带有summary的输出非常不完整(至少在r笔记本输出中,该表在多个页面上进行了拆分)。我想将一个变量作为汇总输出的行,将另一个变量作为列,然后在实际表格中将行&列数据
一些示例数据:

I am trying to summarize data across two variables, and the output with summarize is very chunky (at least in the r notebook output where the table breaks over multiple pages). I'd like to have one variable as the rows of the summary output, and the other as the columns, and then in the actual table the means for each combination of row & column data Some example data:

 dat1 <- data.frame(
    category = rep(c("catA", "catB", "catC"), each=4),
    age = sample(1:2,size=4,replace=T),
    value = rnorm(12)
 )

然后我通常会得到如下的摘要数据框:

and then I would usually get my summary dataframe like this:

dat1 %>% group_by(category,age)%>% summarize(mean(value))

看起来像这样:

which looks like this:

但我的实际数据中的每个变量都有10多个级别,因此该表非常长且难以读取。
我更喜欢这样使用我创建的东西:

but my actual data each of the variables have 10+ levels, so the table is very long and hard to read. I would prefer something like this, which I created using:

dat1 %>% group_by(category)
%>% summarize(mean.age1 =mean(value[age==1]),
mean.age2 =mean(value[age==2]))

推荐答案

您只需要使用 tidyr 除了执行以下操作:

You just need to use tidyr in addition to do something like this:

library(dplyr)
library(tidyr)
dat1 %>%
  group_by(category, age) %>%
  summarise(mean = mean(value)) %>%
  spread(age, mean, sep = '')

输出如下:

Source: local data frame [3 x 3]
Groups: category [3]

  category      age1      age2
*   <fctr>     <dbl>     <dbl>
1     catA 0.2930104 0.3861381
2     catB 0.5752186 0.1454201
3     catC 1.0845645 0.3117227

这篇关于使用列中的grouped_by变量汇总交叉表中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆