R / ggplot2非平凡聚合函数使用多列 [英] R/ggplot2 non-trivial aggregation function using multiple columns

查看:86
本文介绍了R / ggplot2非平凡聚合函数使用多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想ggplot(R)基于计算表的多个数字列与某个分类列(这也是group by)计算聚合值的条形图。



df:

  V1 V2分类
1 1 c1
2 1 c2
1 3 c2
2 3 c3

我是对我的有效聚合函数感兴趣的是:
$ b $ sum(V1 * V2)/ sum(V2)



我尝试过这样做:

  ggplot(df,aes(x = categorical ))+ 
stat_summary_bin(aes(y = V1 * V2),
fun.args = list(d = df $ V2),
fun.y = function(y,d)sum (y)/ sum(d),
geom =bar)

数值低于预期。我想要的结果是c1:1,c2:1.25,c3:2,但实际结果是:

解决方案

创建所需图的最佳方法是在调用 ggplot 。以下是使用 tidyverse 工具的代码:

  library(tidyverse)
df%>%
group_by(categorical)%>%
summary(stat = sum(V1 * V2)/ sum(V2))%>%
ggplot aes(categorical,stat))+
geom_bar(stat =identity)

注意


  1. 使用 stat =identity geom_bar 不执行任何计算,只绘制预先计算的值。它是专为像你这样的情况而设计的。

  2. 我假设。


I would like to ggplot(R) a bar graph of aggregated values based on the computation of multiple numeric columns of a table vs. some categorical column (this is also the "group by") of said table.

df:

V1  V2  categorical
 1   1     c1
 2   1     c2
 1   3     c2
 2   3     c3

I am interested in my effective aggregate function to be:

sum(V1 * V2) / sum(V2)

I attempted this:

ggplot(df, aes(x = categorical)) +
   stat_summary_bin(aes(y = V1 * V2), 
                    fun.args = list(d = df$V2), 
                    fun.y = function(y, d) sum(y) / sum(d), 
                    geom = "bar")

but the values resulted lower than expected. My desired result is c1: 1, c2: 1.25, c3: 2 but the actual result is:

解决方案

The best way to create the desired plot is to compute the desired statistics manually before calling ggplot. Here is the code using tidyverse tools:

library(tidyverse)
df %>%
  group_by(categorical) %>%
  summarise(stat = sum(V1 * V2) / sum(V2)) %>%
  ggplot(aes(categorical, stat)) +
    geom_bar(stat = "identity")

Notes:

  1. With stat = "identity" geom_bar doesn't perform any computation and just plots the precomputed values. It was designed specifically for the kind of situations like yours.

  2. At c2 output should be 1.25, I presume.

这篇关于R / ggplot2非平凡聚合函数使用多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆