ggplot 中的分组条形图 [英] Grouped bar plot in ggplot

查看:39
本文介绍了ggplot 中的分组条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个调查文件,其中行是观察和列问题.

以下是一些 中,您有一个数据框看起来像这样:

>头(df)ID 类型 Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE1 1 A 1980 450 338 154 36 13 92 2 A 2000 288 407 212 54 16 233 3 A 2020 196 434 246 68 19 364 4 B 1980 111 326 441 90 21 115 5 B 2000 63 298 443 133 42 216 6 B 2020 36 257 462 162 55 30

由于您在第 4-9 列中有数值,这些数值稍后会绘制在 y 轴上,因此可以使用 reshape 轻松转换并绘制.

对于我们当前的数据集,我们需要类似的东西,所以我们使用 freq=table(col(raw), as.matrix(raw)) 来得到这个:

>数据名字非常.坏坏好非常.好1 食物 7 6 5 22 音乐 5 5 7 33 人 6 3 7 4

想象一下你有 Very.BadBadGood 等等,而不是 X1PCE、<代码>X2PCE,X3PCE.看到相似之处了吗?但是我们需要先创建这样的结构.因此 freq=table(col(raw), as.matrix(raw)).

I have a survey file in which row are observation and column question.

Here are some fake data they look like:

People,Food,Music,People
P1,Very Bad,Bad,Good
P2,Good,Good,Very Bad
P3,Good,Bad,Good
P4,Good,Very Bad,Very Good
P5,Bad,Good,Very Good
P6,Bad,Good,Very Good

My aim is to create this kind of plot with ggplot2.

  • I absolutely don't care of the colors, design, etc.
  • The plot doesn't correspond to the fake data

Here are my fake data:

raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)

But if I choose Y as count then I'm facing an issue about choosing the X and the Group values... I don't know if I can succeed without using reshape2... I've also tired to use reshape with melt function. But I don't understand how to use it...

解决方案

EDIT: Eight years later...

This needs a tidyverse solution, so here is one, with all non-base packages explicitly stated so that you know where each function comes from (except for read.csv which is from utils which comes with base R):

library(magrittr) # needed for %>% if dplyr is not attached

"http://pastebin.com/raw.php?i=L8cEKcxS" %>%
  read.csv(sep = ",") %>%
  tidyr::pivot_longer(cols = c(Food, Music, People.1),
                      names_to = "variable",
                      values_to = "value") %>%
  dplyr::group_by(variable, value) %>%
  dplyr::summarise(n = dplyr::n()) %>%
  dplyr::mutate(value = factor(
    value,
    levels = c("Very Bad", "Bad", "Good", "Very Good"))
  ) %>%
  ggplot2::ggplot(ggplot2::aes(variable, n)) +
  ggplot2::geom_bar(ggplot2::aes(fill = value),
                    position = "dodge",
                    stat = "identity")


The original answer:

First you need to get the counts for each category, i.e. how many Bads and Goods and so on are there for each group (Food, Music, People). This would be done like so:

raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)

raw=raw[,c(2,3,4)] # getting rid of the "people" variable as I see no use for it

freq=table(col(raw), as.matrix(raw)) # get the counts of each factor level

Then you need to create a data frame out of it, melt it and plot it:

Names=c("Food","Music","People")     # create list of names
data=data.frame(cbind(freq),Names)   # combine them into a data frame
data=data[,c(5,3,1,2,4)]             # sort columns

# melt the data frame for plotting
data.m <- melt(data, id.vars='Names')

# plot everything
ggplot(data.m, aes(Names, value)) +   
  geom_bar(aes(fill = variable), position = "dodge", stat="identity")

Is this what you're after?

To clarify a little bit, in ggplot multiple grouping bar you had a data frame that looked like this:

> head(df)
  ID Type Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE
1  1    A  1980   450   338   154    36    13     9
2  2    A  2000   288   407   212    54    16    23
3  3    A  2020   196   434   246    68    19    36
4  4    B  1980   111   326   441    90    21    11
5  5    B  2000    63   298   443   133    42    21
6  6    B  2020    36   257   462   162    55    30

Since you have numerical values in columns 4-9, which would later be plotted on the y axis, this can be easily transformed with reshape and plotted.

For our current data set, we needed something similar, so we used freq=table(col(raw), as.matrix(raw)) to get this:

> data
   Names Very.Bad Bad Good Very.Good
1   Food        7   6    5         2
2  Music        5   5    7         3
3 People        6   3    7         4

Just imagine you have Very.Bad, Bad, Good and so on instead of X1PCE, X2PCE, X3PCE. See the similarity? But we needed to create such structure first. Hence the freq=table(col(raw), as.matrix(raw)).

这篇关于ggplot 中的分组条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆