使用列中的grouped_by变量汇总交叉表中的数据 [英] summarizing data in cross-table with grouped_by variable in columns
问题描述
我正在尝试对两个变量之间的数据进行汇总,并且带有summary的输出非常不完整(至少在r笔记本输出中,该表在多个页面上进行了拆分)。我想将一个变量作为汇总输出的行,将另一个变量作为列,然后在实际表格中将行&列数据
一些示例数据:
I am trying to summarize data across two variables, and the output with summarize is very chunky (at least in the r notebook output where the table breaks over multiple pages). I'd like to have one variable as the rows of the summary output, and the other as the columns, and then in the actual table the means for each combination of row & column data Some example data:
dat1 <- data.frame(
category = rep(c("catA", "catB", "catC"), each=4),
age = sample(1:2,size=4,replace=T),
value = rnorm(12)
)
然后我通常会得到如下的摘要数据框:
and then I would usually get my summary dataframe like this:
dat1 %>% group_by(category,age)%>% summarize(mean(value))
看起来像这样:
which looks like this:
但我的实际数据中的每个变量都有10多个级别,因此该表非常长且难以读取。
我更喜欢这样使用我创建的东西:
but my actual data each of the variables have 10+ levels, so the table is very long and hard to read. I would prefer something like this, which I created using:
dat1 %>% group_by(category)
%>% summarize(mean.age1 =mean(value[age==1]),
mean.age2 =mean(value[age==2]))
有
推荐答案
您只需要使用 tidyr
除了执行以下操作:
You just need to use tidyr
in addition to do something like this:
library(dplyr)
library(tidyr)
dat1 %>%
group_by(category, age) %>%
summarise(mean = mean(value)) %>%
spread(age, mean, sep = '')
输出如下:
Source: local data frame [3 x 3]
Groups: category [3]
category age1 age2
* <fctr> <dbl> <dbl>
1 catA 0.2930104 0.3861381
2 catB 0.5752186 0.1454201
3 catC 1.0845645 0.3117227
这篇关于使用列中的grouped_by变量汇总交叉表中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!