在因子水平内而不是ggplot2中的计数内的图形比例 [英] Graph proportion within a factor level rather than a count in ggplot2

查看:56
本文介绍了在因子水平内而不是ggplot2中的计数内的图形比例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个具有多个级别的分类变量.每个变量中的级别具有不同数量的观察值,例如:

I have two categorical variables with multiple levels. The levels within each variable have different numbers of observations, for example:

var1 <- c("Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left","Left", "Right", NA, "Left", "Right", "Right", "Right", "Left", "Left", "Right", "Left", "Left","Left", "Right", "Left", "Right", "Right", "Right", "Left", "Left", "Right", NA, "Left", "Left")
var2 <- c("Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", NA, "Slightly lower","Higher", "Lower", NA, "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly higher", "Higher", "Higher", "Higher", "Slightly higher","Higher", "Lower", "Slightly higher", "Slightly higher", "Slightly higher", "Lower", "Slightly lower", "Higher", "Higher", "Higher", NA, "Slightly lower")
df <- as.data.frame(cbind(var1, var2))

我想创建一个图形,该图形绘制选择了var2的每个级别的var1的每个类别的比例.因此,例如,在这里,选择答案"较高的左"组的比例(选择更高"的左人数除以左总人数),紧接着选择答案较高"的右"组所占的比例依次为每个答案(选择较高"的权利人的人数除以权利人的总数).

I would like to create a graph that plots the proportion of each category of var1 who chose each level of var2. So for example here, the proportion of group "Left" who chose answer "Higher" (Number of Left people who chose Higher divided by total number of Left people) , next to the proportion of group "Right" who chose answer "Higher" (Number of Right people who chose Higher divided by total number of Right people) for each answer in turn.

我写了下面的ggplot代码,它并排显示了每个答案选项的每个组的计数图,但是它没有给出比例,因此左和右两组不可比(由于每个组中的人数不同.如果可能,我还想为每个组的左和右指定特定的颜色...

I have written the below ggplot code which gives me a graph of the counts of each group for each answer option side by side, but it doesn't give me the proportion so the two groups of Left and Right aren't comparable (as there are different numbers of people in each group. I should like also to specify particular colours for each group Left and Right if possible...

Plot<-ggplot(df, aes(var2))+ 
  geom_bar( aes(fill=var1),position = "dodge")+ 
  labs(x="Left or Right",y="Count")+
  scale_y_continuous()) +
  scale_fill_discrete(name = "Answer:")+ theme_classic()+ theme(legend.position="top")

此代码的第二个问题是我得到了代码中NA值的因子水平.我知道我可以在ggplot代码中的df上使用na.omit,这对于此小型数据帧而言效果很好,但是我的实际数据集具有多个列,如果运行na.omit,它将删除所有NA中所有列的所有行,这是数百行数据,我不想做!有没有办法从ggplot代码中的数据帧中的特定变量中删除NA?

The second problem I have with this code is that I get a factor level for the NA values I have in my code. I know I could use na.omit on df in my ggplot code, which works fine for this small dataframe, but my real dataset has multiple columns and if you run na.omit then it removes all rows across all columns with NAs in, which is hundreds of rows of data, which I don't want to do! Is there a way to remove NAs from specific variables in a dataframe within the ggplot code?

如果有人有什么好主意.提前非常感谢您!

If anyone has any ideas that would be wonderful. Thank you so much in advance!

推荐答案

我们可以计算每个组中的比例,然后进行绘图.您也可以使用 scale_fill_manual

We can calculate the proportion in each group and then plot. Also you can manually specify colors using scale_fill_manual

library(dplyr)
library(ggplot2)

df %>%
  na.omit() %>%
  group_by(var1, var2) %>%
  summarise(n = n()) %>%
  mutate(n = n/sum(n)) %>%
  ungroup() %>%
  ggplot() + aes(var2, n, fill = var1) + 
  geom_bar(position = "dodge", stat = "identity") + 
  labs(x="Left or Right",y="Count")+
  scale_y_continuous() +
  scale_fill_discrete(name = "Answer:")+ theme_classic()+ 
  theme(legend.position="top")  +
  scale_fill_manual(values = c("black", "red"))

在这里,我删除了其中所有带有 NA 的行.如果只想对特定列执行此操作,则可以将 filter is.na 一起使用以删除这些值.因此,例如,要仅从 var1 中删除 NA 值,我们可以

Here I have removed all the rows with NA in it. If you want to do it for only specific columns you can use filter with is.na to remove those values. So for example, to remove NA values only from var1, we can do

df %>%
  filter(!is.na(var1)) 
  group_by(var1, var2) %>% .....

这篇关于在因子水平内而不是ggplot2中的计数内的图形比例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆