ggplot2替代品,用于填写条形图,多行中出现因子 [英] ggplot2 alternatives to fill in barplots, occurence of factor in multiple rows

查看:70
本文介绍了ggplot2替代品,用于填写条形图,多行中出现因子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R还是很陌生,我在从数据中绘制条形图时遇到问题,如下所示:

I'm pretty new to R and I have a problem with plotting a barplot out of my data which looks like this:

condition answer
2    H
1    H
8    H
5    W
4    M
7    H
9    H
10   H
6    H
3    W

数据包含100行条件1到10,每个条件随机生成10次(条件1的10倍,条件8的10倍,...)。每个条件都有一个答案,可能是H表示命中,M表示未命中或W表示错误。

The data consists of 100 rows with the conditions 1 to 10, each randomly generated 10 times (10 times condition 1, 10 times condition 8,...). Each of the conditions also has a answer which could be H for Hit, M for Miss or W for wrong.

我想绘制每个条件中的命中数我试图在ggplot2中执行以下操作的barplot(例如条件1的10中有8个命中...)

I want to plot the number of Hits for each condition in a barplot (for example 8 Hits out of 10 for condition 1,...) for that I tried to do the following in ggplot2

ggplot(data=test, aes(x=test$condition, fill=answer=="H"))+
  geom_bar()+labs(x="Conditions", y="Hitrate")+
  coord_cartesian(xlim = c(1:10), ylim = c(0:10))+ 
  scale_x_continuous(breaks=seq(1,10,1))

它看起来像这样:

除了覆盖所有内容的红色外,这实际上正是我所需要的。您会看到条件3到5没有蓝色条,因为这些条件没有命中。

This actually exactly what I need except for the red color which covers everything. You can see that conditions 3 to 5 have no blue bar, because there are no hits for these conditions.

有什么方法可以消除这种红色并也许计算不同条件下的点击量? ->我尝试了dplyr的count函数,但是它只显示了针对此特定条件的H的数量。 3-5只是按计数忽略,输出中甚至没有0。->但我仍然需要这些数字作图

Is there any way to get rid of this red color and to maybe count the amount of hits for the different conditions? -> I tried the count function of dplyr but it only showed me the amount of H when there where some for this particular condition. 3-5 where just "ignored" by count, there wasn't even a 0 in the output.-> but I'd still need those numbers for the plot

对于这篇特别长的文章,我感到很抱歉,但是考虑到这一点,我真的处于知识的末尾。我愿意征求建议或替代方案!预先感谢!

I'm sorry for this particular long post but I'm really at the end of knowledge considering this. I'd be open for suggestions or alternatives! Thanks in advance!

推荐答案

在这种情况下,稍加预处理就可以了。我制作了可以重现问题的示例数据,即在某些情况下不会出现 H。

This is a situation where a little preprocessing goes a long way. I made sample data that would recreate the issue, i.e. has cases where there won't be any "H"s.

使用适当的工具,而不是依靠 ggplot 来以您想要的方式聚合数据。由于您提到 dplyr :: count ,因此我使用 dplyr 函数。

Instead of relying on ggplot to aggregate data in the way you want it, use proper tools. Since you mention dplyr::count, I use dplyr functions.

预处理任务是对答案为 H的观察值进行计数,包括计数为0的情况。要确保保留所有组合,请将条件转换为因子并设置 .drop = F 中的 ,这又传递给 group_by

The preprocessing task is to count observations with answer "H", including cases where the count is 0. To make sure all combinations are retained, convert condition to a factor and set .drop = F in count, which is in turn passed to group_by.

library(dplyr)
library(ggplot2)

set.seed(529)
test <- data.frame(condition = rep(1:10, times = 10),
                   answer = c(sample(c("H", "M", "W"), 50, replace = T),
                              sample(c("M", "W"), 50, replace = T)))

hit_counts <- test %>%
  mutate(condition = as.factor(condition)) %>%
  filter(answer == "H") %>%
  count(condition, .drop = F)

hit_counts
#> # A tibble: 10 x 2
#>    condition     n
#>    <fct>     <int>
#>  1 1             0
#>  2 2             1
#>  3 3             4
#>  4 4             2
#>  5 5             3
#>  6 6             0
#>  7 7             3
#>  8 8             2
#>  9 9             1
#> 10 10            1

然后将其绘制出来。 geom_col geom_bar 的版本,因为您已经有了y值,而不是拥有 ggplot 为您计算它们。

Then just plot that. geom_col is the version of geom_bar for where you have your y-values already, instead of having ggplot tally them up for you.

ggplot(hit_counts, aes(x = condition, y = n)) +
  geom_col()

这篇关于ggplot2替代品,用于填写条形图,多行中出现因子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆