ggplot 中的分组条形图 [英] Grouped bar plot in ggplot
问题描述
我有一个调查文件,其中行是观察和列问题.
以下是一些 中,您有一个数据框看起来像这样:
>头(df)ID 类型 Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE1 1 A 1980 450 338 154 36 13 92 2 A 2000 288 407 212 54 16 233 3 A 2020 196 434 246 68 19 364 4 B 1980 111 326 441 90 21 115 5 B 2000 63 298 443 133 42 216 6 B 2020 36 257 462 162 55 30
由于您在第 4-9 列中有数值,这些数值稍后会绘制在 y 轴上,因此可以使用 reshape
轻松转换并绘制.
对于我们当前的数据集,我们需要类似的东西,所以我们使用 freq=table(col(raw), as.matrix(raw))
来得到这个:
>数据名字非常.坏坏好非常.好1 食物 7 6 5 22 音乐 5 5 7 33 人 6 3 7 4
想象一下你有 Very.Bad
、Bad
、Good
等等,而不是 X1PCE
、<代码>X2PCE,X3PCE
.看到相似之处了吗?但是我们需要先创建这样的结构.因此 freq=table(col(raw), as.matrix(raw))
.
I have a survey file in which row are observation and column question.
Here are some fake data they look like:
People,Food,Music,People
P1,Very Bad,Bad,Good
P2,Good,Good,Very Bad
P3,Good,Bad,Good
P4,Good,Very Bad,Very Good
P5,Bad,Good,Very Good
P6,Bad,Good,Very Good
My aim is to create this kind of plot with ggplot2
.
- I absolutely don't care of the colors, design, etc.
- The plot doesn't correspond to the fake data
Here are my fake data:
raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
But if I choose Y as count then I'm facing an issue about choosing the X and the Group values... I don't know if I can succeed without using reshape2
... I've also tired to use reshape with melt function. But I don't understand how to use it...
EDIT: Eight years later...
This needs a tidyverse solution, so here is one, with all non-base packages explicitly stated so that you know where each function comes from (except for read.csv
which is from utils
which comes with base R):
library(magrittr) # needed for %>% if dplyr is not attached
"http://pastebin.com/raw.php?i=L8cEKcxS" %>%
read.csv(sep = ",") %>%
tidyr::pivot_longer(cols = c(Food, Music, People.1),
names_to = "variable",
values_to = "value") %>%
dplyr::group_by(variable, value) %>%
dplyr::summarise(n = dplyr::n()) %>%
dplyr::mutate(value = factor(
value,
levels = c("Very Bad", "Bad", "Good", "Very Good"))
) %>%
ggplot2::ggplot(ggplot2::aes(variable, n)) +
ggplot2::geom_bar(ggplot2::aes(fill = value),
position = "dodge",
stat = "identity")
The original answer:
First you need to get the counts for each category, i.e. how many Bads and Goods and so on are there for each group (Food, Music, People). This would be done like so:
raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw=raw[,c(2,3,4)] # getting rid of the "people" variable as I see no use for it
freq=table(col(raw), as.matrix(raw)) # get the counts of each factor level
Then you need to create a data frame out of it, melt it and plot it:
Names=c("Food","Music","People") # create list of names
data=data.frame(cbind(freq),Names) # combine them into a data frame
data=data[,c(5,3,1,2,4)] # sort columns
# melt the data frame for plotting
data.m <- melt(data, id.vars='Names')
# plot everything
ggplot(data.m, aes(Names, value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity")
Is this what you're after?
To clarify a little bit, in ggplot multiple grouping bar you had a data frame that looked like this:
> head(df)
ID Type Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE
1 1 A 1980 450 338 154 36 13 9
2 2 A 2000 288 407 212 54 16 23
3 3 A 2020 196 434 246 68 19 36
4 4 B 1980 111 326 441 90 21 11
5 5 B 2000 63 298 443 133 42 21
6 6 B 2020 36 257 462 162 55 30
Since you have numerical values in columns 4-9, which would later be plotted on the y axis, this can be easily transformed with reshape
and plotted.
For our current data set, we needed something similar, so we used freq=table(col(raw), as.matrix(raw))
to get this:
> data
Names Very.Bad Bad Good Very.Good
1 Food 7 6 5 2
2 Music 5 5 7 3
3 People 6 3 7 4
Just imagine you have Very.Bad
, Bad
, Good
and so on instead of X1PCE
, X2PCE
, X3PCE
. See the similarity? But we needed to create such structure first. Hence the freq=table(col(raw), as.matrix(raw))
.
这篇关于ggplot 中的分组条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!