ggplot条形图中的分组/堆叠因子级别 [英] Grouping/stacking factor levels in ggplot bar chart

查看:72
本文介绍了ggplot条形图中的分组/堆叠因子级别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R相对较新,并且是ggplot的完整入门者,但是我还没有找到解决我看似简单的问题的答案.我想使用ggplot制作条形图,其中堆叠三个或更多图形化因子水平中的两个.

I'm relatively new to R and a complete beginner with ggplot, but I haven't managed to find an answer to the seemingly simple problem I have. Using ggplot, I would like to make a bar chart in which two of three or more graphed factor levels are stacked.

本质上,这是我正在查看的数据类型:

Essentially, this is the type of data I am looking at:

df <- data.frame(Answer=c("good","good","kinda good","kinda good",
  "kinda good","good","bad","good","bad"))

这为我提供了三个层次的因子,其中两个非常相似:

This provides me with a factor with three levels, two of which are very similar:

       Answer
1       good
2       good
3 kinda good
4 kinda good
5 kinda good
6       good
7        bad
8       good
9        bad

如果我现在让ggplot为我遍历这些数据,

If I let ggplot go over these data for me now,

c <- ggplot(df, aes(df$Answer))
c + geom_bar()

我将获得一个包含三列的条形图.但是,我想以两列结尾,其中一列应该是两个因子级别好"和有点好"的堆栈,仍然明显分开.

I will get a bar chart with three columns. However, I would like to end up with two columns, one of which should be a stack of the two factor levels "good" and "kinda good", still visibly separated.

我正在处理100列输入(拼字法研究),我将需要手动进行输入,因此我想使代码尽可能容易地进行调整.其中一些具有十多个级别,我需要将它们分为三列.因此,在大多数情况下,我的数据看起来更像是这样:

I am working with 100 columns of input (study on orthography), which I will need to go through manually, so I would like to make the code as easily adjustable as possible. Some of them have more than ten levels, and I would need to sort them into three columns. Therefore, in most cases my data would more likely look like this:

df <- data.frame(Answer=c("good","goood","goo0d","good",
  "I don't know","Bad","bad","baaad","really bad"))

因此,我将其分为三类.在大约一半的情况下,我可能仍可以使用模式匹配进行过滤,因为我将研究空格的使用.但是,另一半正在考虑大写,这会有点混乱,或者至少非常乏味.

I would consequently group this into three categories. In approximately half of the cases, I could probably still filter using pattern matching because I will be looking at the use of spaces. The other half, however, is looking at capitalization, which would get a little messy, or at least very tedious.

我想到了两种不同的方法来更有效地解决此问题:

I have thought of two different approaches to solve this issue more efficiently:

仅重写因子水平,但这会导致信息丢失(我想将两个水平分开).我想保留原始级别名称,因为我认为我需要它们来绘制该堆叠列中的比率并正确标记该列.

Simply rewriting the factor levels, but this would result in a loss of information (and I would like to keep the two levels separate). I would like to keep the original levels names because I think I need them to graph the ratio within that stacked column and to label the column properly.

我可以将相应的列/因子拆分为两个单独的列/因子,然后将它们彼此并排绘制图形,从而创建一个伪"的三​​维.这看起来是最有前途的方法,但是在我处理100列数据之前,是否有更优雅的方法,也许在ggplot2包中,在这里我可以指向/分组级别名称,而不是更改/重新排序后面的数据框?

I could split the respective column/factor into two separate columns/factors and graph them next to each other, and thus create a "fake" third dimension. This is looking to be the most promising approach, but before I work through 100 columns of data with this - is there a more elegant approach, maybe within the ggplot2 package, where I could just point/group the level names instead of changing/reordering the data frame behind it?

谢谢!

推荐答案

您可以尝试以下方法,将答案分组的方式更加自动化.

You can try the following for a more automated approach in grouping the answers.

我们会根据您的数据选择一些关键字,然后在它们上循环查看哪些答案可能包含每个关键字

We select some keywords based on your data and loop over them to see which answers may contain each keyword

groups <- c('good','bad','ugly','know')

df <- data.frame(Answer=c("good","medium good","kinda good","still good",
                          "I don't know","good","bad","good","really bad"))

idx <- sapply(groups, function(x) grepl(x, df$Answer, ignore.case = TRUE))
df$group <- rep(colnames(idx), nrow(idx))[t(idx)]
df

#         Answer group
# 1         good  good
# 2  medium good  good
# 3   kinda good  good
# 4   still good  good
# 5 I don't know  know
# 6         good  good
# 7          bad   bad
# 8         good  good
# 9   really bad   bad


library('ggplot2')
ggplot(df, aes(group, fill = Answer)) + geom_bar()

这篇关于ggplot条形图中的分组/堆叠因子级别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆