通过堆积条形图中的数字变量对具有分类变量的列进行排序 [英] sort columns with categorical variables by numerical varables in stacked barplot

查看:53
本文介绍了通过堆积条形图中的数字变量对具有分类变量的列进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数字(百分比)和分类变量的数据框.我想生成一个堆积的条形图(使用ggplot2),并用数值变量对列(类别变量)进行排序.

I have a dataframe containing numerical (percentages) and categorical variables. I'd like to produce a stacked barplot (using ggplot2) with the colums (categorical variables) sorted by the numerical variable.

我尝试过:

如何控制ggplot2上使用标识的堆积条形图

和这个:

https://community.rstudio.com/t/a-tidy-way-to-order-stacked-bar-chart-by-fill-subset/5134

但是我不熟悉因素,我想了解更多.

but I am not familiar with factors and I'd like to understand more.

# Reproduce a dummy dataset
perc <- c(11.89, 88.11, 2.56, 97.44, 5.96, 94.04, 6.74, 93.26)
names <- c('A', 'A', 'B', 'B', 'C', 'C', 'D', 'D')

df <- data.frame(class = rep(c(-1, 1), 4), 
                 percentage = perc, 
                 name = names)

# Plot
ggplot(df, aes(x = factor(name), y = percentage, fill = factor(class))) +
  geom_bar(stat = "identity") +
  scale_fill_discrete(name = "Class") +
  xlab('Names')

此代码生成一个图,其条形由变量名称"排序.我想按变量百分比"对其进行排序.即使我手动订购数据框,结果图也一样.

This code produces a plot whose bars are ordered by the variable "names". I'd like to order it by the variable "percentage". Even if I manually order the dataframe, the resulting plot is the same.

推荐答案

这里的问题是,给定类别( name )的所有百分比实际上加起来为100%.因此,按百分比排序通常无法通过 aes(x = reorder(name,percent),y = percent)来实现,

The issue here is that all your percentages for a given category (name) in fact add up to 100%. So sorting by percentage, which is normally achieved via aes(x = reorder(name, percentage), y = percentage), won’t work here.

相反,您可能想按class = 1(或class = -1)的数据的百分比排序.这样做需要一些技巧:使用 ifelse class == 1 的行选择百分比.对于所有其他行,选择值0:

Instead, you probably want to order by the percentage of the data that has class = 1 (or class = -1). Doing this requires some trickery: Use ifelse to select the percentage for the rows where class == 1. For all other rows, select the value 0:

ggplot(df, aes(x = reorder(name, ifelse(class == 1, percentage, 0)), y = percentage, fill = factor(class))) +
  geom_bar(stat = "identity") +
  scale_fill_discrete(name = "Class") +
  xlab('Names')

您可能只想执行 reorder 指令以查看发生的情况:

You might want to execute just the reorder instruction to see what’s going on:

reorder(df$name, ifelse(df$class == 1, df$percentage, 0))
# [1] A A B B C C D D
# attr(,"scores")
#      A      B      C      D
# 44.055 48.720 47.020 46.630
# Levels: A D C B

如您所见,您的姓名根据每个类别的平均百分比重新排序(默认情况下, reorder 使用平均值;

As you can see, your names got reordered based on the mean percentage for each category (by default, reorder uses the mean; see its manual page for more details). But the "mean" we calculated was between each name’s percentage for class = 1, and the value 0 (for class ≠ 1).

这篇关于通过堆积条形图中的数字变量对具有分类变量的列进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆