在R中绘制Likert变量的堆叠条形图 [英] Plot stacked bar chart of likert variables in R

查看:100
本文介绍了在R中绘制Likert变量的堆叠条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我说我有一个像这样的数据框:

  P Q1 Q2 ...1 1 4 12 2 3 43 1 1 4 

这些列告诉我哪个人回答了相应的问题q1,q2,....那些问题需要在4点李克特量表上进行回答(例如,批准"表示1,稍许"表示2,依此类推).我如何绘制例如这两个问题都导致堆积的条形图(以%为单位)?

看起来应该像

此图有很多正确的地方,总体布局看起来不错.剩下的就是以几种方式更改外观.我们可以通过扩展绘图代码来更改绘图的那些方面来做到这一点.即,我想执行以下操作:

  • 添加标题并更改一些轴标签
  • 将配色方案更改为Brewer刻度之一
  • 删除y轴上的空白
  • 简化主题并将图例移动到其他位置

现在完整的绘图代码如下所示.您应该能够识别出代码的哪些部分正在执行上面提到的每件事.

  ggplot(questions,aes(x = Question_num))+geom_bar(aes(fill = Answer))+scale_fill_brewer(palette ='Spectral',direction = -1)+scale_y_continuous(expand = expansion(0))+实验室(title =我的李克特情节",subtitle =二十个问题!",x ='问题',y ='已回答数字')+theme_classic()+主题(legend.position ='top') 

很酷,是吗?

至于是否有一个简单的功能可以满足我的需求?".回答为[否" ;.您可以编写一个,但是这可能取决于数据的初始格式化方式.如果您将需要经常绘制这些图,请设置一个R脚本来自动为您执行此操作:).

可能是百分比?

OP在评论中有一个请求,要求通过百分比显示相同的信息.这也很容易做到,而且通常是人们想要处理的李克特情节...所以就做吧!我们将分两个阶段将计数转换为百分比.首先,我们将获得轴和钢筋设置.其次,我们将在每个栏的顶部覆盖文本,以显示每个问题以这种方式回答的百分比.

首先,让我们将条形图和y轴设置为百分比,而不是计数.我们绘制条形几何图形的线是 geom_bar(aes(fill = Answer)).该函数中的 position ="stack" 也有一个隐藏的默认值(我们不必指定). position 参数处理 ggplot 应该如何处理在该特定x值处绘制多个条形时的情况.在这种情况下,它确定如何处理与每个问题对应的 questions $ Answer 的每个值对应的5条.

如您可能假设的,

堆叠"只是将它们堆叠在彼此之上.由于我们有20个人回答每个问题,因此每个问题的所有酒吧的总高度(20)相同.如果你只有19人回答问题#3?好吧,总的吧台高度会比其他地方要短.

通常,李克特图都将条形图显示为相同的高度,因为它们是根据它们占总数的比例进行堆叠的.在这种情况下,我们希望每组条形总计不超过1.这意味着应将10个回答一种方式的人映射到条形高度0.5(50%).

这是其他 position 值起作用的地方.我们想使用 position ="fill" 来引用我们希望将需要绘制在相同x轴位置处绘制的条进行堆叠的方法……但不是根据其值,而是根据相对于该x轴位置的总值的比例.

最后,我们要确定规模.如果我们仅使用 position ="fill" ,则我们的y轴比例将具有"0、0.25、0.50、0.75和1.0"的值.或类似的东西.我们希望它看起来像"0%,25%,50%,75%,100%".您可以在 scale_y_continuous()函数中执行此操作,并指定 labels 参数.在这种情况下, scales 包为此目的提供了一个方便的 percent_format()函数.放在一起,您将获得以下内容:

  ggplot(questions,aes(x = Question_num))+geom_bar(aes(fill = Answer),position ="fill")+scale_fill_brewer(palette ='Spectral',direction = -1)+scale_y_continuous(expand = expansion(0),labels = scales :: percent_format())+实验室(标题=我李克特块",副标题=二十个问题!",x ='问题',y ='已回答数字')+theme_classic()+主题(legend.position ='top') 

在顶部显示文字

不幸的是,要以百分比形式显示文本,并不是那么简单.为此,我们需要对数据进行汇总,在这种情况下,最简单的方法是先在一个单独的数据集中进行汇总,然后使用映射到我们的汇总数据框的文本几何来标记文本.

通过指定我们要如何将数据分组在一起,然后将 n()或每个答案的计数分配为 freq ,来创建摘要数据框列值.

  questions_summary<-问题%>%group_by(问题编号,答案)%>%summary(freq = n())%&%; ungroup() 

然后,我们将其用于映射到新的几何: geom_text . y 值需要再次以比例表示.就像出于 geom_bar 以及上述原因一样,我们必须使用"fill" 位置.我还想确保将位置设置为中间".每个条形图都是垂直的,因此我们必须使用 position_fill(vjust = 0.5)而不是仅仅使用"fill" .

您会注意到最后一个至关重要的部分是我们使用的是 group 美学.这个非常重要.对于文本geom, ggplot 需要知道如何对数据进行分组.在条形几何图形的情况下,它是明显的".(可以这么说),因为条形的颜色不同,所以条形的每种颜色都是分隔线.对于文本,总是需要指定此内容(如何分割值),我们通过 group 美观的方式进行此操作.

  ggplot(questions,aes(x = Question_num))+geom_bar(aes(fill = Answer),position ="fill")+geom_text(数据= questions_summary,aes(y = freq,label = percent(freq/20,1),group = Answer),position = position_fill(vjust = 0.5),color ='gray25',size = 3.5)+scale_fill_brewer(palette ='Spectral',direction = -1)+scale_y_continuous(expand = expansion(0),labels = scales :: percent_format())+实验室(title =我的李克特情节",subtitle =二十个问题!",x ='问题',y ='已回答数字')+theme_classic()+主题(legend.position = '顶部') 

Voila!

lets say I have a data frame that looks like this:

  P   Q1  Q2 ...
  1   1   4    1
  2   2   3    4
  3   1   1    4

where the columns tell me which person answered which of the questions q1, q2, ... accordingly. Those questions require an answer on a 4 point likert scale (e.g. "approve" means 1, "slightly approve" means 2 and so on). How do I plot e.g. both question results in a stacked bar plot (in %)?

It should look somewhat like this.

All I find online is very complex code I can't handle or fail to understand ... Isn't there just a simple function that does what I want?

Thank you!

解决方案

I am sure I am not the only one who would take issue with this part of your question:

All I find online is very complex code I can't handle or fail to understand ... Isn't there just a simple function that does what I want?

"Very complex code" is quite subjective. However, I can understand that learning code and trying to figure out how to do what it is you want to do (which may seem simple at first) can be daunting and frustrating. I'll try to show you how to approach this in a very logical and clear manner, so that you can understand that the code shown here is actually not too complex.

The Dataset

OP did not provide a dataset, but I'll demonstrate a random one here. This is also a good opportunity to showcase how you can generate this type of data via code (and have it scalable). Let's assume we have 20 people answering 20 questions. I'll create the data in a data frame structure by providing first only one column of people, then adding 20 columns of questions to that. Each cell for the answers to the questions will randomly select an answer from 1 to 5.

library(dplyr)
library(tidyr)
library(ggplot2)

# make the dataset
set.seed(8675309)
questions <- data.frame(Person = 1:20)

for (i in 1:20) {
  questions[[paste0('Q',i)]] <- sample(1:5, 20, replace=TRUE)
}

That gives us a data frame of 20 rows and 21 columns (1 column for Person + 20 columns for questions).

Prepare the Data

When preparing to generate a plot, you will almost always have to prep the data in some way. There are only two things I want to do here first before we plot. The first step is to make our data into a format which is referred to as Tidy Data. In the format we have it in now... it's okay to plot in Excel, but if we want to have a quality way of organizing and summarizing this data, we want to organize it to be in a "longer" table format. What we need is to organize in a way that has columns organized as:

Person | Question_num | Answer

You can do that a few ways. Here I'm using dplyr and tidyr packages and the gather() function, but other ways exist (namely using pivot_longer()):

questions <- questions %>% gather(key='Question_num', value='Answer', -Person)

The final thing I want to do here is to convert our column questions$Answer into a categorical variable, not a continuous number. Why? Well, the participants could only answer 1, 2, 3, 4, or 5. An answer of "3.4" would not make sense, so our data should be discrete, not continuous. We will do that by converting questions$Answer into a factor. This also allows us to do two things at the same time that are quite useful here:

  1. Setting the levels - this indicates which order you want the levels of the factor.
  2. Setting the labels - this allows you to remap 1 to be "Approve" and 2 to be "Slightly Approve" and so on.

You can then check the data after and see that questions$Answer column is now composed of our labels() values, not numbers.

questions$Answer <- factor(questions$Answer,
    levels=1:5,
    labels=c('Approve','Slightly Approve','Neutral','Slightly Disapprove','Disapprove'))

Make the Plot

We can then make the plot using the ggplot2 package. GGplot draws your data onto the plot area using geoms. In this case, we can use geom_bar() which will draw a barplot (totaling up the number/count of each item), and requires an x aesthetic only. If we set the fill color of each bar to be equal to the Answer column, then it will color-code the bars to be associated with the number of each answer for each question. By default, the bars are stacked on top of one another in the order that we set previously for the levels argument of the questions$Answer column.

ggplot(questions, aes(x=Question_num)) +
  geom_bar(aes(fill=Answer))

There's a lot of things that are right with this plot and the general layout looks good. All that's left is to change the appearance in a few ways. We can do that by extending our plot code to change those aspects of the plot. Namely, I want to do the following:

  • Add a title and change some axis labels
  • Change the color scheme to one of the Brewer scales
  • Remove the whitespace in the y axis
  • Simplify the theme and move the legend to a different location

The full plot code now looks like this shown below. You should be able to identify which parts of the code are doing each thing referenced above.

ggplot(questions, aes(x=Question_num)) +
  geom_bar(aes(fill=Answer)) +
  scale_fill_brewer(palette='Spectral', direction=-1) +
  scale_y_continuous(expand=expansion(0)) +
  labs(
    title='My Likert Plot', subtitle='Twenty Questions!',
    x='Questions', y='Number Answered'
  ) +
  theme_classic() +
  theme(legend.position='top')

Pretty cool, eh?

As for "is there a simple function that does what I want?". The answer is "no". You can write one, but that might depend on how your data is initially formatted. If you're going to need to make these plots often, setup an R script to do that automatically for you :).

EDIT: Percentages maybe???

OP had a request in the comment on displaying the same info via percentages. This is also fairly straightforward to do and often what one wants to do with a likert plot... so let's do it! We'll convert the counts into percentages in two stages. First, we'll get the axis and the bars setup to do that. Second, we'll overlay text on top of each bar to display the % answering that way for each question.

First, let's set the bars and y axis to be percentages, not counts. Our line to draw the bar geom was geom_bar(aes(fill=Answer)). There's a hidden default value for the position = "stack" inside that function as well (which we don't have to specify). The position argument deals with how ggplot should handle the situation when more than one bar needs to be drawn at that particular x value. In this case, it determines what to do with the 5 bars that correspond to each value of questions$Answer corresponding to each question.

"Stack", as you might assume, just stacks them on top of each other. Since we have 20 people answering each question, all of our bars are the same total height (20) for every question. What if you had only 19 people answering question #3? Well, that total bar height would be shorter than the rest.

Normally, likert plots all show the bars the same height, because they are stacked according to the proportion of the whole they occupy for the total. In this case, we want each stack of bars to total up to 1. That means that 10 people answering one way should be mapped to a bar height of 0.5 (50%).

This is where the other position values come into play. We want to use position = "fill" to reference that we want the bars that need to be drawn at the same x axis position to be stacked... but not according to their value, but according to the proportion of the total value for that x axis position.

Finally, we want to fix our scale. If we just use position="fill" our y axis scale would have values of "0, 0.25, 0.50, 0.75, and 1.0" or something like that. We want that to look like "0%, 25%, 50%, 75%, 100%". You can do that within the scale_y_continuous() function and specify the labels argument. In this case, the scales package has a convenient percent_format() function for just this purpose. Putting this together, you get the following:

ggplot(questions, aes(x=Question_num)) +
  geom_bar(aes(fill=Answer), position="fill") +
  scale_fill_brewer(palette='Spectral', direction=-1) +
  scale_y_continuous(expand=expansion(0), labels=scales::percent_format()) +
  labs(
    title='My Likert Plot', subtitle='Twenty Questions!',
    x='Questions', y='Number Answered'
  ) +
  theme_classic() +
  theme(legend.position='top')

Getting text on top

To put the text on top as percentages, that's unfortunately not quite as simple. For this, we need to summarize the data, and in this case the most simple way to do that would be to summarize before hand in a separate dataset, then use that to label the text using a text geom mapped to our summary data frame.

The summary data frame is created by specifying how we want to group our data together, then assigning n(), or the count of each answer, as the freq column value.

questions_summary <- questions %>%
  group_by(Question_num, Answer) %>%
  summarize(freq = n()) %>% ungroup()

We then use that to map to a new geom: geom_text. The y value needs to be represented as a proportion again. Just like for geom_bar and the reasons above, we have to use the "fill" position. I also want to make sure the position is set to the "middle" vertically for each bar, so we have to specify a bit further by using position_fill(vjust=0.5) instead of just "fill".

You'll notice a final critical piece is that we're using a group aesthetic. This is very important. For the text geom, ggplot needs to know how the data is to be grouped. In the case of the bar geom, it was "obvious" (so-to-speak) that since the bars are colored differently, each color of bar was the separation. For text, this always needs to be specified (how to split the values) and we do this through the group aesthetic.

ggplot(questions, aes(x=Question_num)) +
  geom_bar(aes(fill=Answer), position="fill") +
  geom_text(
    data=questions_summary,
    aes(y=freq, label=percent(freq/20,1), group=Answer),
    position=position_fill(vjust=0.5),
    color='gray25', size=3.5
  ) +
  scale_fill_brewer(palette='Spectral', direction=-1) +
  scale_y_continuous(expand=expansion(0), labels=scales::percent_format()) +
  labs(
    title='My Likert Plot', subtitle='Twenty Questions!',
    x='Questions', y='Number Answered'
  ) +
  theme_classic() +
  theme(legend.position='top')

Voila!

这篇关于在R中绘制Likert变量的堆叠条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆