ggplot条形图中的分组/堆叠因子级别 [英] Grouping/stacking factor levels in ggplot bar chart
问题描述
我对R相对较新,并且是ggplot的完整入门者,但是我还没有找到解决我看似简单的问题的答案.我想使用ggplot制作条形图,其中堆叠三个或更多图形化因子水平中的两个.
I'm relatively new to R and a complete beginner with ggplot, but I haven't managed to find an answer to the seemingly simple problem I have. Using ggplot, I would like to make a bar chart in which two of three or more graphed factor levels are stacked.
本质上,这是我正在查看的数据类型:
Essentially, this is the type of data I am looking at:
df <- data.frame(Answer=c("good","good","kinda good","kinda good",
"kinda good","good","bad","good","bad"))
这为我提供了三个层次的因子,其中两个非常相似:
This provides me with a factor with three levels, two of which are very similar:
Answer
1 good
2 good
3 kinda good
4 kinda good
5 kinda good
6 good
7 bad
8 good
9 bad
如果我现在让ggplot为我遍历这些数据,
If I let ggplot go over these data for me now,
c <- ggplot(df, aes(df$Answer))
c + geom_bar()
我将获得一个包含三列的条形图.但是,我想以两列结尾,其中一列应该是两个因子级别好"和有点好"的堆栈,仍然明显分开.
I will get a bar chart with three columns. However, I would like to end up with two columns, one of which should be a stack of the two factor levels "good" and "kinda good", still visibly separated.
我正在处理100列输入(拼字法研究),我将需要手动进行输入,因此我想使代码尽可能容易地进行调整.其中一些具有十多个级别,我需要将它们分为三列.因此,在大多数情况下,我的数据看起来更像是这样:
I am working with 100 columns of input (study on orthography), which I will need to go through manually, so I would like to make the code as easily adjustable as possible. Some of them have more than ten levels, and I would need to sort them into three columns. Therefore, in most cases my data would more likely look like this:
df <- data.frame(Answer=c("good","goood","goo0d","good",
"I don't know","Bad","bad","baaad","really bad"))
因此,我将其分为三类.在大约一半的情况下,我可能仍可以使用模式匹配进行过滤,因为我将研究空格的使用.但是,另一半正在考虑大写,这会有点混乱,或者至少非常乏味.
I would consequently group this into three categories. In approximately half of the cases, I could probably still filter using pattern matching because I will be looking at the use of spaces. The other half, however, is looking at capitalization, which would get a little messy, or at least very tedious.
我想到了两种不同的方法来更有效地解决此问题:
I have thought of two different approaches to solve this issue more efficiently:
仅重写因子水平,但这会导致信息丢失(我想将两个水平分开).我想保留原始级别名称,因为我认为我需要它们来绘制该堆叠列中的比率并正确标记该列.
Simply rewriting the factor levels, but this would result in a loss of information (and I would like to keep the two levels separate). I would like to keep the original levels names because I think I need them to graph the ratio within that stacked column and to label the column properly.
我可以将相应的列/因子拆分为两个单独的列/因子,然后将它们彼此并排绘制图形,从而创建一个伪"的三维.这看起来是最有前途的方法,但是在我处理100列数据之前,是否有更优雅的方法,也许在ggplot2包中,在这里我可以指向/分组级别名称,而不是更改/重新排序后面的数据框?
I could split the respective column/factor into two separate columns/factors and graph them next to each other, and thus create a "fake" third dimension. This is looking to be the most promising approach, but before I work through 100 columns of data with this - is there a more elegant approach, maybe within the ggplot2 package, where I could just point/group the level names instead of changing/reordering the data frame behind it?
谢谢!
推荐答案
您可以尝试以下方法,将答案分组的方式更加自动化.
You can try the following for a more automated approach in grouping the answers.
我们会根据您的数据选择一些关键字,然后在它们上循环查看哪些答案可能包含每个关键字
We select some keywords based on your data and loop over them to see which answers may contain each keyword
groups <- c('good','bad','ugly','know')
df <- data.frame(Answer=c("good","medium good","kinda good","still good",
"I don't know","good","bad","good","really bad"))
idx <- sapply(groups, function(x) grepl(x, df$Answer, ignore.case = TRUE))
df$group <- rep(colnames(idx), nrow(idx))[t(idx)]
df
# Answer group
# 1 good good
# 2 medium good good
# 3 kinda good good
# 4 still good good
# 5 I don't know know
# 6 good good
# 7 bad bad
# 8 good good
# 9 really bad bad
library('ggplot2')
ggplot(df, aes(group, fill = Answer)) + geom_bar()
这篇关于ggplot条形图中的分组/堆叠因子级别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!