通过分组变量实时计算R ggplot [英] R ggplot on-the-fly calculation by grouping variable

查看:75
本文介绍了通过分组变量实时计算R ggplot的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常想知道您是否可以让ggplot通过情节的构面面进行即时计算,其计算方式类似于使用dplyr::group_by进行计算.因此,在下面的示例中,是否可以在不首先更改df的情况下计算每个不同类别的总和,而不是总总和?

I have often wondered if you can get ggplot to do on-the-fly calculations by the facet groups of the plot in a similar way that they would be done using dplyr::group_by. So in the example below is it possible to calculate the cumsum for each different category, rather than the overall cumsum without altering df first?

library(ggplot2)

df <- data.frame(X = rep(1:20,2), Y = runif(40), category = rep(c("A","B"), each = 20))

ggplot(df, aes(x = X, y = cumsum(Y), colour = category))+geom_line()

我显然可以使用dplyr做一个简单的解决方法,但是由于我经常这样做,所以我很想知道是否有一种方法可以防止多次指定分组变量(在group_by

I can obviously do an easy workaround using dplyr, however as I do this frequently I was keen to know if there is a way to prevent having to specify the grouping variables multiple times (here in group_by and aes(colour = …).

可以选择的方法,但在这种情况下不是我要的

Working alternative, but not what I'm asking for in this case

library(dplyr)
library(ggplot2)

df %>% group_by(category) %>% mutate(Ysum = cumsum(Y)) %>% 
  ggplot(aes(x = X, y = Ysum, colour = category))+geom_line()

(要回答@ 42-评论),我主要是出于好奇而问,是否可行,不是因为替代方法不起作用.我还认为,如果我要绘制多个图,这些图基于不同的列或不同的数据集求和(或进行其他类似的计算)不同的变量,而不是连续地进行分组,变异然后绘制,那么这将变得更加整洁.我可以编写一个为我做的功能,但我认为它可能是我所缺少的内置功能(ggplot帮助没有涉及到真正的细节).

(To answer in response to the @42- comment) I am mainly asking out of curiosity if this is possible, not because the alternative doesn't work. I also think it would be neater in my code if I am making a number of plots which are summing (or other similar calculations) different variables based on different columns or in different datasets, rather than continuously having to group, mutate then plot. I could write a function to do it for me but I thought it might be inbuilt functionality that I missing (the ggplot help doesn't go into the real details).

推荐答案

我已经在包ggpmisc的开发版本中添加了stat_apply_group()stat_apply_panel().由于先前的更新刚刚被接受,因此需要一段时间才能将该更新升级到CRAN.

I have added stat_apply_group() and stat_apply_panel() to the development version of my package 'ggpmisc'. It will take some time before this update makes it to CRAN as the previous update has just been accepted.

暂时应从Bitbucket安装"ggpmisc"以使新的统计信息可用.

For the time being 'ggpmisc' should be installed from Bitbucket for the new stats to be available.

devtools::install_bitbucket("aphalo/ggpmisc", ref = "no-debug")

然后这解决了问题:

library(ggplot2)
library(ggpmisc)
set.seed(123456)
df <- data.frame(X = rep(1:20,2),
                 Y = runif(40),
                 category = rep(c("A","B"), each = 20))
ggplot(df, aes(x = X, y = Y, colour = category)) +
  stat_apply_group(.fun.y = cumsum)

在ggplot代码中应用cumsum()而不是像第二个示例中那样使用'dplyr'"pipe",这使我们不必两次指定分组.

Applying cumsum() within the ggplot code instead of using a 'dplyr' "pipe" as in the second example saves us from having to specify the grouping twice.

这篇关于通过分组变量实时计算R ggplot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆