ggplot中宽度可变的堆叠条形图 [英] Stacked bar chart with varying widths in ggplot

查看:89
本文介绍了ggplot中宽度可变的堆叠条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试构建宽度可变的堆叠条形图,以使宽度表示分配的平均数量,而高度表示分配的数量.

I try to build a stacked bar chart with varying widths, so that the width indicates the mean amount of an allocation, whereas the height indicates the numbers of allocations.

接下来,您会发现我的可重复数据:

Following, you'll find my reproducible data:

procedure = c("method1","method2", "method3", "method4","method1","method2", "method3", "method4","method1","method2", "method3","method4")
sector =c("construction","construction","construction","construction","delivery","delivery","delivery","delivery","service","service","service","service") 
number = c(100,20,10,80,75,80,50,20,20,25,10,4)
amount_mean = c(1,1.2,0.2,0.5,1.3,0.8,1.5,1,0.8,0.6,0.2,0.9) 

data0 = data.frame(procedure, sector, number, amount_mean)

使用geom_bar并在es中包含宽度时,出现以下错误消息:

When using geom_bar and including widths within aes, I get the following error message:

position_stack requires non-overlapping x intervals. Furthermore, the bars are no longer stacked. 

bar<-ggplot(data=data0,aes(x=sector,y=number,fill=procedure, width = amount_mean)) + 
geom_bar(stat="identity") 

我也查看了mekko软件包,但这似乎仅用于条形图.

I also looked at the mekko-package, but it seems that this is only for bar charts.

这是我最后想要的内容(不基于以上数据):

Here is, what I'd like to have in the end (not based on above data):

有什么办法解决我的问题吗?

Any idea how to solve my problem?

推荐答案

我也尝试了相同的 geom_col(),但是我遇到了同样的问题-使用 position="stack" 似乎我们不能在不进行堆叠的情况下分配 width 参数.

I have tried the same, geom_col() as well but I've run to the same problem - with position = "stack" it seems that we can't assign a width parameter without unstacking.

但是事实证明,该解决方案非常简单-我们可以使用 geom_rect()手动构建此类图.

But it turned up, that solution is quite simple - we can use geom_rect() to build such plot "by hand".

有您的数据:

df = data.frame(
  procedure   = rep(paste("method", 1:4), times = 3),
  sector      = rep(c("construction", "delivery", "service"), each = 4),
  amount      = c(100, 20, 10, 80, 75, 80, 50, 20, 20, 25, 10, 4),
  amount_mean = c(1, 1.2, 0.2, 0.5, 1.3, 0.8, 1.5, 1, 0.8, 0.6, 0.2, 0.9)
)

起初,我已经转换了您的数据集:

At first I have transformed your data set:

df <- df %>%
  mutate(amount_mean = amount_mean/max(amount_mean),
         sector_num = as.numeric(sector)) %>%
  arrange(desc(amount_mean)) %>%
  group_by(sector) %>%
  mutate(
    xmin = sector_num - amount_mean / 2,
    xmax = sector_num + amount_mean /2,
    ymin = cumsum(lag(amount, default = 0)), 
    ymax = cumsum(amount)) %>%
  ungroup()

我在这里做什么:

  1. 我按比例缩小了 amount_mean ,所以 0> = amount_mean< = 1 (更好地进行绘图,无论如何我们没有其他比例可以显示真实的 amount_mean 的值);
  2. 我还将 sector 变量解码为数值型(用于绘图,请参见下文);
  3. 我已经按 amount_mean (重载-在底部, light指在顶部)按降序排列了数据集;
  4. 按部门分组,我计算了 xmin xmax 来表示 amount_mean ymin ymax 表示金额.前两个有点棘手. ymax 很明显-您只需从第一个开始就为所有 amount 取一个累计和.您还需要累积总和来计算 ymin ,但是从0开始.因此,第一个矩形以 ymin = 0 绘制,第二个矩形-以先前三角形的 ymin = ymax 等绘制.所有这些都是在每个单独的 sector <组中执行的/code> s.
  1. I scaled down amount_mean, so the 0 >= amount_mean <= 1 (better for plotting, anyway we don't have another scale to show the real values of amount_mean);
  2. I also decoded sector variable into numerical (for plotting, see below);
  3. I've arranged data set in descending order by amount_mean (heavy means - at the bottom, light means on the top);
  4. Grouping by sector, I calculated xmin, xmax to represent the amount_mean, and ymin, ymax for amount. The former two are a bit trickier. ymax is obviouse - you just take a cumulative sum for all amount starting from the first one. You need cumulative sum to calculate ymin as well, but starting from 0. So the first rectangle plotted with ymin = 0, second - with ymin = ymax of previouse triangle etc. All of this is performed withing each separate group of sectors.

绘制数据:

df %>%
  ggplot(aes(xmin = xmin, xmax = xmax,
             ymin = ymin, ymax = ymax, 
             fill = procedure
             )
         ) +
  geom_rect() +
  scale_x_continuous(breaks = df$sector_num, labels = df$sector) +
  #ggthemes::theme_tufte() +
  theme_bw() +
  labs(title = "Question 51136471", x = "Sector", y = "Amount") +
  theme(
    axis.ticks.x = element_blank()
    )

结果:

另一个防止阻止 procedure 变量重新排序的选项.所以所有人都说红色"下降了,绿色"上升了等等.但是看起来很丑:

Another option to prevent to procedure variable to be reordered. So all let say "reds" are down, "greens" above etc. But it looks ugly:

df <- df %>%
  mutate(amount_mean = amount_mean/max(amount_mean),
         sector_num = as.numeric(sector)) %>%
  arrange(procedure, desc(amount), desc(amount_mean)) %>%
  group_by(sector) %>%
  mutate(
    xmin = sector_num - amount_mean / 2,
    xmax = sector_num + amount_mean /2,
    ymin = cumsum(lag(amount, default = 0)), 
    ymax = cumsum(amount)
    ) %>%
  ungroup()

这篇关于ggplot中宽度可变的堆叠条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆