ggplot2 stats =“身份”并且在条形图中堆叠颜色给出“条纹”条形图 [英] ggplot2 stats="identity" and stacking colors in bar plot gives "striped" bar chart
问题描述
继我的
https://dl.dropbox.com/u /1811289/RBootcamp/slides/Tutorial3_DataSort.html
这也是:
http://streaming.stat.iastate.edu/workshops/r-intro/lectures/6-advancedmanipulation.pdf
...只是因为?ddply有点......奇怪(例子与选项的解释不同) - 看起来没有什么可告诉的为简写写作......但我可能错过了一个观点......
Following the answer to my former question I have another question raising :
How, without reshaping the data, to plot a stacked bar plot with different colour depending on another category, at the same time using stats="identity" to sum up values for each stacked area ?
The stats identity works nicely to sum up the values, but for non-stacked columns. In a stacked column, the stacking is somehow "multiplied" or "striped", see picture below.
Some data sample :
element <- rep("apples", 15)
qty <- c(2, 1, 4, 3, 6, 2, 1, 4, 3, 6, 2, 1, 4, 3, 6)
category1 <- c("Red", "Green", "Red", "Green", "Yellow")
category2 <- c("small","big","big","small","small")
d <- data.frame(element=element, qty=qty, category1=category1, category2=category2)
Which gives that table :
id element qty category1 category2
1 apples 2 Red small
2 apples 1 Green big
3 apples 4 Red big
4 apples 3 Green small
5 apples 6 Yellow small
6 apples 2 Red small
7 apples 1 Green big
8 apples 4 Red big
9 apples 3 Green small
10 apples 6 Yellow small
11 apples 2 Red small
12 apples 1 Green big
13 apples 4 Red big
14 apples 3 Green small
15 apples 6 Yellow small
Then :
ggplot(d, aes(x=category1, y=qty, fill=category2)) + geom_bar(stat="identity")
But the graph is a bit messy: the colors aren't grouped together !
Why is there this behaviour?
Is there still an option to correctly group the colors without reshaping my data ?
I was using for a time this solution but it happened that on my large databases (60 000 entries) the ordered stacked bars ggplot2 was drawing, depending on the zoom level, some white spaces in between the bars. Not sure where this issue comes from - but a wild guess is that I'm stacking too many bars :p .
Aggregating the data with plyr solved the problem:
element <- rep("apples", 15)
qty <- c(2, 1, 4, 3, 6, 2, 1, 4, 3, 6, 2, 1, 4, 3, 6, )
category1 <- c("Red", "Green", "Red", "Green", "Yellow")
category2 <- c("small","big","big","small","small")
d <- data.frame(element=element, qty=qty, category1=category1, category2=category2)
plyr :
d <- ddply(d, .(category1, category2), summarize, qty=sum(qty, na.rm = TRUE))
To explain briefly the contents of this formula:
ddply(1, .(2, 3), summarize, 4=function(6, na.rm = TRUE))
1: dataframe name 2, 3: columns to keep -> the grouping factors to make the calculations by summarize: to create a new dataframe (unlike transform) 4: the name of the calculated column function: the function to apply - here the sum() 6: the column on which to apply the function
4, 5, 6 can be repeated for more calculated fields...
ggplot2 : ggplot(d, aes(x=category1, y=qty, fill=category2)) + geom_bar(stat="identity")
So now, as suggested by Roman Luštrik, data is aggregated according to the graph to be shown.
After applying ddply, indeed, the data is cleaner:
category1 category2 qty
1 Green big 3
2 Green small 9
3 Red big 12
4 Red small 6
5 Yellow small 18
I finally understood how to manage my dataset due this really great source of information: http://jaredknowles.com/r-bootcamp https://dl.dropbox.com/u/1811289/RBootcamp/slides/Tutorial3_DataSort.html
And that one too : http://streaming.stat.iastate.edu/workshops/r-intro/lectures/6-advancedmanipulation.pdf
... Just because ?ddply is a bit... Strange (example differ from the explanation of the options) - looks that there is nothing told for the shorthand writing... But I may have missed a point...
这篇关于ggplot2 stats =“身份”并且在条形图中堆叠颜色给出“条纹”条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!