R / ggplot2非平凡聚合函数使用多列 [英] R/ggplot2 non-trivial aggregation function using multiple columns
问题描述
我想ggplot(R)基于计算表的多个数字列与某个分类列(这也是group by)计算聚合值的条形图。
df:
V1 V2分类
1 1 c1
2 1 c2
1 3 c2
2 3 c3
我是对我的有效聚合函数感兴趣的是:
$ b $
我尝试过这样做:
ggplot(df,aes(x = categorical ))+
stat_summary_bin(aes(y = V1 * V2),
fun.args = list(d = df $ V2),
fun.y = function(y,d)sum (y)/ sum(d),
geom =bar)
数值低于预期。我想要的结果是c1:1,c2:1.25,c3:2,但实际结果是:
创建所需图的最佳方法是在调用 ggplot $ c $之前手动计算所需的统计信息C>。以下是使用
tidyverse
工具的代码:
library(tidyverse)
df%>%
group_by(categorical)%>%
summary(stat = sum(V1 * V2)/ sum(V2))%>%
ggplot aes(categorical,stat))+
geom_bar(stat =identity)
注意:
-
使用
stat =identity
geom_bar
不执行任何计算,只绘制预先计算的值。它是专为像你这样的情况而设计的。
I would like to ggplot(R) a bar graph of aggregated values based on the computation of multiple numeric columns of a table vs. some categorical column (this is also the "group by") of said table.
df:
V1 V2 categorical
1 1 c1
2 1 c2
1 3 c2
2 3 c3
I am interested in my effective aggregate function to be:
sum(V1 * V2) / sum(V2)
I attempted this:
ggplot(df, aes(x = categorical)) +
stat_summary_bin(aes(y = V1 * V2),
fun.args = list(d = df$V2),
fun.y = function(y, d) sum(y) / sum(d),
geom = "bar")
but the values resulted lower than expected. My desired result is c1: 1, c2: 1.25, c3: 2 but the actual result is:
The best way to create the desired plot is to compute the desired statistics manually before calling ggplot
. Here is the code using tidyverse
tools:
library(tidyverse)
df %>%
group_by(categorical) %>%
summarise(stat = sum(V1 * V2) / sum(V2)) %>%
ggplot(aes(categorical, stat)) +
geom_bar(stat = "identity")
Notes:
With
stat = "identity"
geom_bar
doesn't perform any computation and just plots the precomputed values. It was designed specifically for the kind of situations like yours.At
c2
output should be 1.25, I presume.
这篇关于R / ggplot2非平凡聚合函数使用多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!