具有多个因素分组的Barplots,以及这些因素之间的变量均值 [英] Barplots with multiple factor groupings and mean of variable across those factors

查看:58
本文介绍了具有多个因素分组的Barplots,以及这些因素之间的变量均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个条形图,以显示按单身或按大学毕业生或非大学毕业生分组的工会和非工会工人的平均小时工资.虽然我设法用两个因素分组构造了一个可通过的条形图,但我不知道如何使用三个因素分组来构造.我看到的具有三个因素的示例仅着眼于频率计数,因此我不确定如何将所有因素中另一个变量的均值合并到图中.我要创建的东西看起来像这样(在Stata中创建):

I am trying to create a barplot that shows the average hourly wages of union and nonunion workers grouped by single or married grouped by college grad or not college grad. While I've managed to construct a passable barplot with two factor groupings, I cannot figure out how to do so with three factor groupings. The examples I have seen that have three factors look just at frequency counts, so I'm not sure how to incorporate the mean of another variable across all the factors into the plot. What I am looking to create is something that looks like this (created in Stata): Average Hourly Wage by Union Status, Marital Status, and College Graduation My code looks like this:

levelbar = tapply(wage, list(as.factor(union), as.factor(married), 
as.factor(collgrad)), mean)
par(mfrow = c(1, 2))
barplot(levelbar, beside = TRUE)
barplot(t(levelbar), beside = TRUE)

When I run this, however, I receive the error:

Error in barplot.default(levelbar, beside = TRUE) : 
'height' must be a vector or a matrix

Any help on this would be appreciated. I'm sure ggplot might be useful here, but I do not have a great deal of experience using that package.

解决方案

Here's a reproducible example using ggplot and the built-in dataset Titanic.

Note that we calculate the means first and use stat = identity to make sure we get those into the plot.

# Format the Titanic dataframe
Titanic_df <- Titanic %>% as_tibble()

# Make Class, Sex, Age, and Survived factors
for (col in c("Class", "Sex", "Age", "Survived")) {
  Titanic_df[[col]] <- factor(Titanic_df[[col]])
}

# Get by group means
means <- Titanic_df %>% 
  group_by(Class, Sex, Survived) %>% 
  summarise(
    mean_n = mean(n)
  )

# Plot: facets are the Classes, bar colors are the two Sexes, and the groupings in each facet are Survived vs. Not Survived
ggplot(data = means) +
  geom_bar(aes(x = Survived, y = mean_n, fill = Sex), stat = "identity", position = "dodge") +
  facet_wrap(~ Class)

这篇关于具有多个因素分组的Barplots,以及这些因素之间的变量均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆