说明ggplot2密度图中的平均值和标准偏差 [英] Illustrate mean and standard deviation in ggplot2 density plot

查看:662
本文介绍了说明ggplot2密度图中的平均值和标准偏差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图构建一个绘图,绘制正态分布变量,显示x轴上的平均值和y轴上的标准偏差(SD)。有点像密度图,但我没有在Y轴上的密度,我想要有SD(值)。

我正在处理下面的数据,

  set.seed(1)
mu1 < - rnorm(10 ^ 5,mean = 1,sd = 1)
mu3 < - rnorm(10 ^ 5,mean = 3,sd = 2)

两个正态分布的变量。这里它们的平均值和sd,

 #install.packages(tidyverse,dependencies = TRUE)
require(tidyverse )
tibble(mu1,mu3)%>%summarise_all(funs(mean,sd))
#> #A tibble:1 x 4
#> mu1_mean mu3_mean mu1_sd mu3_sd
#> < DBL> < DBL> < DBL> < DBL>
#> 1 0.9993454 3.000825 0.9982848 1.998234

我玩过

解决方案

平均值和标准偏差是在x-scale ,所以你需要沿着x轴绘制它们。 y轴是给定x区间内点的密度,类似于直方图中条的高度。

也许这会给你类似于你想要的东西:下面的代码增加了一条水平线,它横跨每个密度图的标准偏差,以及标记的下降线他们在x轴上的位置。 sd线位于y值,分布宽度等于标准偏差。如果你愿意的话,你可以另外(或者相反)填写标准偏差范围内的区域。

  library(dplyr)$ (密度(foo,n = n)$ x,密度(bar,n = n))b 
$ b#密度
n = 2 ^ 10
df = data.frame $ x),
y = c(密度(foo,n = n)$ y,密度(bar,n = n)$ y),
group = rep(c(foo,bar ),each = n))

## Mean和SD
msd = melt(data.frame(foo = foo,bar = bar))%>%
group_by(group = variable)%>%summarize(mean = mean(value),sd = sd(value))

#找到y值(密度),其中sd与密度宽度相同b $ b msd $ y = unlist(lapply(unique(df $ group),function(g){
d = df [df $ group == g,]
d $ y [which.min(abs (d $ x - (msd $ mean [msd $ group == g] - msd $ sd [msd $ group == g])))
}))

ggplot( df,aes(x = x,y = y,color = group))+
geom_line()+ labs(x = NULL)+
geom_segment(data = msd,aes(y = y,yend = y,x = mean-sd,xend = mean + sd),lty =21)+
geom_point(data = msd,a es(y = y,x = mean))+
geom_segment(data = msd,aes(x = mean-sd,xend = mean-sd,y = 0,yend = y),alpha = 0.5, =21)+
geom_segment(data = msd,aes(x = mean + sd,xend = mean + sd,y = 0,yend = y),alpha = 0.5,lty =21)


I'm trying to construct a plot where I plot normally distributed variables showing their mean on the x-axis and the standard deviation (SD) on the y-axis. Kinda like a density plot, but instead of having the density on the y-axis I want to have the SD (value).

I'm working with the data below,

set.seed(1)
mu1 <- rnorm(10^5, mean = 1, sd = 1)
mu3 <- rnorm(10^5, mean = 3, sd = 2)

two normally distributed variables. Here their mean and sd,

# install.packages("tidyverse", dependencies = TRUE)
require(tidyverse)
tibble(mu1, mu3) %>% summarise_all(funs(mean, sd))
#> # A tibble: 1 x 4
#>    mu1_mean mu3_mean    mu1_sd   mu3_sd
#>       <dbl>    <dbl>     <dbl>    <dbl>
#> 1 0.9993454 3.000825 0.9982848 1.998234

I've played around with , and other packages, to get closer to what I want. I've also tried copying this function from a box-plot doing something similar, having succeeded yet.

Here is my start,

tibble(mu1, mu3) %>% gather() %>% ggplot() + 
  geom_density(aes(x = value, colour = key)) + 
  labs(x = 'mean', y = 'currently density, but I would like sd')

解决方案

The mean and standard deviation are measured on the x-scale, so you'd need to plot them along the x-axis. The y-axis is the density of points within a given x-interval, and is analogous to the height of the bars in a histogram.

Maybe this will give you something like what you were looking for: The code below adds a horizontal line that spans the standard deviation of each density plot, along with droplines to mark their location on the x-axis. The sd line is located at y-value where the width of the distribution is equal to the standard deviation. If you wish, you could in addition (or instead) fill the region spanned by the standard deviation.

library(dplyr)

# Densities
n = 2^10
df = data.frame(x = c(density(foo,n=n)$x, density(bar,n=n)$x),
                y = c(density(foo,n=n)$y, density(bar,n=n)$y),
                group=rep(c("foo","bar"), each=n))

## Mean and SD
msd =  melt(data.frame(foo=foo, bar=bar)) %>% 
         group_by(group=variable) %>% summarise(mean=mean(value), sd=sd(value))

# Find y value (of density) where sd has same width as density
msd$y = unlist(lapply(unique(df$group), function(g) {
  d = df[df$group==g,]
  d$y[which.min(abs(d$x - (msd$mean[msd$group==g] - msd$sd[msd$group==g])))]
}))

ggplot(df, aes(x=x, y=y, colour=group)) + 
  geom_line() + labs(x = NULL) +
  geom_segment(data=msd, aes(y=y,yend=y, x=mean - sd, xend=mean + sd), lty="21") +
  geom_point(data=msd, aes(y=y, x=mean)) +
  geom_segment(data=msd, aes(x=mean-sd, xend=mean-sd, y=0, yend=y), alpha=0.5, lty="21") +
  geom_segment(data=msd, aes(x=mean+sd, xend=mean+sd, y=0, yend=y), alpha=0.5, lty="21")

这篇关于说明ggplot2密度图中的平均值和标准偏差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆