如何在R中绘制等级量表 [英] How to plot a rating scale in R

查看:82
本文介绍了如何在R中绘制等级量表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

代表以下特征等级量表的最佳方法是什么?我想在民主政党和共和党内标注特质(8个特质)和程度或每种情感(1个是低落的情感,5个是强烈的情感)吗?我需要汇总项目吗?我是R的新手,不确定如何解决这个问题.

What is the best way to represent the following trait rating scale? I'd like to label the traits (8 traits) and degrees or each emotion (1 being low feelings, 5 being strong feelings), across the democratic and republican parties? Do I need to aggregate the items? I'm new to R and not sure how to tackle this.

调查问题和规模:

下面是可能由物体引起的感觉或情绪的列表.请使用以下来形容美国联邦方(及其民选官员)如何让你感受到列表.如果该词明确描述了聚会给人的感觉,请选择数字5.如果您确定该词根本不描述聚会给您的感觉,请选择数字1.使用1到5之间的中间数字.以指示这两个极端之间的响应."

"Below is a list of feelings or moods that could be caused by an object. Please use the list below to describe how the U.S. FEDERAL parties (and its elected officials) make you feel. If the word definitely describes how a party makes you feel, then choose the number 5. If you decide that the word does not at all describe how the party makes you feel, then choose the number 1. Use the intermediate numbers between 1 and 5 to indicate responses between these two extremes."

调查样本:

dput(df[Book3(1:nrow(df), 30),])

structure(list(TRAITDEM1 = c(3, 4, 3, 3, 3, 3, 3, 1, 2, 2, 2, 
3, 3, 2, 2, 1, 1, 3, 1, 5, 1, 1, 3, 1, 4, 4, 3, 1, 2, 4), TRAITDEM2 = c(3, 
1, 1, 2, 2, 2, 3, 5, 4, 2, 2, 2, 3, 3, 3, 4, 1, 2, 3, 1, 4, 5, 
2, 3, 1, 1, 1, 4, 1, 2), TRAITDEM3 = c(3, 4, 4, 2, 3, 3, 3, 1, 
1, 2, 2, 3, 3, 2, 2, 1, 1, 3, 1, 5, 1, 1, 3, 1, 4, 5, 4, 1, 3, 
5), TRAITDEM4 = c(3, 2, 1, 2, 2, 2, 4, 5, 4, 5, 2, 3, 2, 3, 3, 
4, 3, 4, 3, 1, 5, 4, 1, 4, 3, 4, 2, 4, 2, 1), TRAITDEM5 = c(3, 
4, 3, 4, 4, 3, 2, 1, 1, 2, 2, 3, 4, 2, 2, 1, 1, 3, 1, 5, 1, 1, 
2, 1, 4, 4, 4, 1, 3, 4), TRAITDEM6 = c(3, 1, 1, 1, 1, 1, 1, 2, 
1, 1, 1, 2, 2, 2, 2, 4, 3, 1, 1, 1, 4, 5, 1, 3, 1, 1, 1, 1, 1, 
1), TRAITDEM7 = c(3, 1, 3, 3, 2, 2, 1, 1, 1, 2, 3, 4, 3, 2, 2, 
1, 1, 2, 2, 5, 1, 1, 1, 3, 3, 4, 2, 1, 5, 5), TRAITDEM8 = c(3, 
1, 1, 1, 2, 1, 3, 5, 2, 4, 1, 1, 2, 2, 3, 1, 3, 1, 2, 1, 5, 5, 
2, 2, 1, 2, 1, 2, 1, 1), TRAITREP1 = c(1, 1, 1, 1, 1, 1, 1, 1, 
1, 4, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 
1), TRAITREP2 = c(1, 5, 5, 5, 5, 5, 5, 2, 5, 2, 5, 5, 5, 5, 4, 
5, 1, 5, 5, 5, 5, 1, 5, 4, 5, 5, 5, 3, 5, 5), TRAITREP3 = c(1, 
1, 1, 1, 2, 1, 1, 2, 1, 4, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 3, 
1, 1, 1, 1, 1, 1, 1, 2), TRAITREP4 = c(1, 5, 5, 1, 5, 5, 5, 3, 
5, 2, 5, 4, 5, 5, 5, 5, 3, 5, 5, 5, 5, 1, 5, 3, 5, 5, 5, 4, 5, 
1), TRAITREP5 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 2, 
1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1), TRAITREP6 = c(1, 
5, 5, 5, 3, 3, 3, 1, 1, 1, 3, 3, 5, 3, 4, 5, 3, 4, 5, 4, 5, 1, 
5, 3, 4, 4, 5, 1, 1, 3), TRAITREP7 = c(1, 1, 1, 1, 2, 2, 1, 1, 
1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 
2), TRAITREP8 = c(1, 5, 5, 5, 4, 5, 5, 2, 5, 2, 5, 4, 5, 5, 4, 
1, 3, 5, 5, 5, 5, 3, 4, 4, 5, 5, 5, 3, 5, 5), PARTYID_Strength = c(5, 
1, 2, 1, 2, 1, 8, 7, 6, 3, 1, 6, 6, 1, 7, 8, 7, 1, 1, 1, 2, 4, 
1, 6, 1, 1, 1, 7, 6, 8)), row.names = c(NA, -30L), class = c("tbl_df", 
"tbl", "data.frame"))

"PartyID_Strength"代表8个政党措施: 1-坚强的民主党人 2-不是很强大的民主党人 3-强大的共和党人 4-不是很强的共和党人 5-独立 6-独立-民主党 7-独立-共和党 8-其他

"PartyID_Strength" represents 8 measures of political parties: 1 - Strong Democrat 2 - Not very strong Democrat 3 - Strong Republican 4 - Not very strong Republican 5 - Independent 6 - Independent - Democrat 7 - Independent - Republican 8 - Other

我以此方式进行了尝试(下图),但它仍未绘制其余四个特征:

I tried it this way (graph below) but it's still not plotting the remaining four traits:

推荐答案

清理数据

为了解决您的问题,我们必须转换您的数据,以便将其转换为 tidy 格式.

原始数据集几乎没有什么特殊问题:

There are few particular problems with your original dataset:

  • 数据采用宽格式,即数据框中的大多数列都可以由3个变量表示;
  • 变量名称不是不言自明的.名称本身是大写的,本身不包含任何有用的信息,不可读,也不适合键入/书写.
  • 我们可以从变量名称中提取其他信息:派对和对派对的感情.第一个是缩写("dem"或"rep"),第二个是对政党的数字编码感觉.但是,编码感觉的数字顺序并不能反映从厌恶到快乐的自然情感顺序.
  • 变量PARTYID_Strength是用数字编码的政党[自我]识别,它也没有反映出最强的民主人士,独立人士和最强的共和党人的自然秩序;
  • Data are in a wide format, i.e. most of the columns from your data frame, can be represented by 3 variables;
  • Names of the variables are not self-explanatory. Names are in upper case which, by itself, does not hold any useful information, they are not readable and not good for typing/writing.
  • There is additional information we can extract from the variable names: Party and Feelings toward the Party. First one is an abbreviation ('dem' or 'rep') second one is the numerically encoded feeling towards the political party. However the order of numbers encoding the feeling does not reflect natural order of emotions from the disgust up to joy;
  • Variable PARTYID_Strength is numerically encoded Political Party [self-]Identification it also does not reflect natural order from strongest democrats through independent towards strongest republicans;
  1. 使用所有以TRAIT开头的变量并将PARTYID_Strength变量保持不变,将 wide 中的数据转换为 long 格式;
  2. TRAIT...变量(政党,对党的感情)中提取有用的信息;
  3. 将所有数字编码变量转换为合理排序的因子;
  4. 为所有变量赋予有意义的名称;
  5. 汇总数据;
  1. Convert data from wide into long format using all variables starting with TRAIT, and leaving PARTYID_Strength variable unchanged;
  2. Extract useful information from the TRAIT... variables (Political Party, Feelings Toward the Party);
  3. Convert all numerically encoded variables into the factors with reasonably ordered levels;
  4. Give all variables meaningful names;
  5. Summarize the data;

转化

我们需要创建几个查找表,以简化工作流程.

Transformations

We need to create several lookup tables, which will simplify the workflow.

隶属关系查找表:

aff_lookup <- c(
  'Strong Democrat',
  'Not very strong Democrat',
  'Strong Republican',
  'Not very strong Republican',
  'Independent',
  'Independent-Democrat',
  'Independent-Republican',
  'Other'
)

我们可以通过该向量进一步订购aff_lookup:

We can further order aff_lookup by this vector:

aff_order = c(1, 2, 6, 5, 7, 4, 3, 8)

情绪/感觉查找表:

emo_lookup <- c(
  'Delighted',    
  'Angry',
  'Happy',
  'Annoyed',
  'Joy',
  'Hateful',
  'Relaxed',
  'Disgusted'
)

我们可以通过此向量订购emo_lookup:

And we can order emo_lookup by this vector:

emo_order <- emo_order <- c(8, 6, 2, 4, 7, 3, 1, 5)

政党查找表:

party_lookup <- c(
  dem = 'National Democratic Party',
  rep = 'National Republican Party'
)

最后,使用所有辅助变量,我们可以将数据转换为所需的形式.

Finally, with all helper variables, we can transform our data into desirable form.

library(tidyverse)

dat %<>%
  rename_all(tolower) %>%
  pivot_longer(
    cols          = starts_with('trait'),
    names_to      = c('party', 'emotion'),
    names_pattern = 'trait(dem|rep)(\\d)',
    values_to     = 'score'
  ) %>%
  mutate(
    party = factor(party_lookup[party]),
    affiliation = factor(
      aff_lookup[partyid_strength], 
      levels = aff_lookup[aff_order]
      ),
    emotion = factor(
      emo_lookup[as.numeric(emotion)], 
      levels = emo_lookup[emo_order]
      )
  ) %>%
  group_by(party, emotion, affiliation) %>%
  summarise(score = median(score)) %>%
  ungroup()

head(dat)

## A tibble: 6 x 4
#  party                     emotion   affiliation                score
#  <fct>                     <fct>     <fct>                      <dbl>
#1 National Democratic Party Disgusted Strong Democrat                1
#2 National Democratic Party Disgusted Not very strong Democrat       2
#3 National Democratic Party Disgusted Independent-Democrat           2
#4 National Democratic Party Disgusted Independent                    3
#5 National Democratic Party Disgusted Independent-Republican         3
#6 National Democratic Party Disgusted Not very strong Republican     5

绘制数据

计划

现在,我们可以将数据绘制为X轴上带有隶属关系(政党身份)和Y轴上带有情感(感情)的民主党人和共和党人的两个单独图.

Plot the data

Plan

Now we can plot the data, as two separate plots for Democrats and Republicans with Affiliation (Political Party Identification) on X-axis and Emotions (Feelings) on Y-axis.

每个情感/归属点将用一个条形表示,该条形的高度表示得分.

Each Emotion/Affilation point is going to be represented as a bar with the height of the bar representing the Score.

我们还可以在绘图中添加颜色编码.从我的角度来看,用从红色(反感)到绿色(欢乐)的颜色渐变对情绪/感觉进行编码可以帮助收集数据的内部结构.

We can also add color encoding to our plot. From my point of view, encoding Emotions/Feelings with a color gradient from red (Disgust) to green (Joy) could help as to gather the internal structure of our data.

dat %>%
  ggplot(
    aes(
      x      = affiliation, 
      y      = as.numeric(emotion) +  (score / max(score) * .95) / 2, 
      height = (score / max(score) * .95), 
      width  = .95,
      fill   = emotion,
      label  = score
      )
    ) +
  geom_tile(show.legend = FALSE) +
  geom_text(size = 3.5, color = 'gray25', alpha = .75) +
  facet_wrap(~ party, scales = 'free') +
  scale_fill_brewer(palette = 'RdYlGn') +
  scale_y_continuous(breaks = sort(emo_order), labels = emo_lookup[emo_order]) +
  labs(x = 'Affiliations', y = 'Emotions') +
  ggthemes::theme_tufte() +
  theme(
    axis.text.x  = element_text(angle = 45, hjust = 1),
    axis.ticks.x = element_blank(),
    axis.text.y  = element_text(hjust = 0, vjust = -0.025),
    axis.ticks.y = element_blank()
  )

如下图所示:

此情节有一个窍门:它看起来像一系列的条形图,bot不是真正的条形图(事实上,不是功能上的).

There is a trick with this plot: it looks like a series of barplots, bot it is not real barplots (by the fact, not functionally).

我做什么:

此解决方案的核心是对每个数据点使用geom_tile().它只是一个矩形(默认情况下为正方形),其几何质心由给定坐标(Affilation,Emotion)确定.

The core of this solution is the use of geom_tile() for each data point. It is just a rectangle (square by default) with geometrical center of mass determined by the given coordinates (Affilation, Emotion).

偏好和情感都是因素,而不是数字.隶属关系还可以,因为我们只想根据其表示的隶属关系定位 tile .

Both Affilation and Emotion are factors, not numerics. And it is OK for Affiliation, because we want only to position our tile according to the Affiliation it represents.

使用Emotion更为复杂,因为我们要根据它所代表的Emotion定位每个图块,而且还要按图块的高度对Score进行编码.

It is more complicated with Emotion, because we want to position each tile according to the Emotion it represents, but also we want to encode Score by the height of the tile.

要定义图块的高度,我们在aes()中使用height参数.我们希望我们的图块高度小于或等于1(具有0.05偏移),以使愤怒"和烦恼"之间的图块不重叠.这就是为什么我们将(score / max(score) * .95用作height参数的原因.

To define the height of the tile we use height parameter within the aes(). We want our tile height to be less or equall to one (with 0.05 offset) so the tiles between let say Angry and Annoyed do not overlap. That's why we use (score / max(score) * .95 for the height parameter.

我们还需要为每个图块提供不同的y坐标,因此图块的中心不是放在代表每种情感的假想线上,而是半高.因此,在绘制图块时,其中心(在y轴上)放置在距基线"上方一半高度的位置.磁贴向上和向下延伸半个高度,从而创建了一个伪造的barplot.这就是下面的代码行as.numeric(emotion) + (score / max(score) * .95) / 2的作用.

We also need to give different y-coordinates for each tile, so the center of the tile is placed not on the imaginary line representing each emotion, but half-height up. So when tile is drawn, it's center (on y-axis) is placed half-height up from the "base line" and the tile extends half-height up and down, creating a fake barplot. That's what the following line of code does as.numeric(emotion) + (score / max(score) * .95) / 2.

我们还按width = .95给出了固定宽度为.95的图块,以 Red-Yellow-Green 渐变填充图块,并为每个图块加上相关的得分.

We also give a tile a fixed width of .95 by width = .95, file the tile with Red-Yellow-Green gradient and lable each tile with the relevant Score.

其余的只是装饰品.但是,请注意我们如何关联Y轴.因为,正如在aes()中定义的那样,它是连续刻度,但是我们要使其成为假离散轴,因此我们使用以下行:

The rest are just decorations. However, note how we relable the Y-axis. Because, as it defined in aes() it is continuous scale, but we want to make it fake discrete axis we use this row:

scale_y_continuous(breaks = sort(emo_order), labels = emo_lookup[emo_order])

在这里,我们仅使用emo_order来表示我们想要从1到8的整数中断,然后我们用有序emo_lookup表中的感觉来标记此中断.

Here we just use our emo_order to say that we want breaks for integers from 1 to 8, and after that we label this breaks with feelings from ordered emo_lookup table.

这篇关于如何在R中绘制等级量表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆