如何可视化概率分布函数之间的差异? [英] How to visualise the difference between probability distribution functions?

查看:201
本文介绍了如何可视化概率分布函数之间的差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试将两个分布函数的直方图之间的差异可视化,例如以下两条曲线的差异:



当差异很大时,您可以在每个顶部绘制两条曲线其他并填补上述差异,但是当差异变得非常小时,这是麻烦的。另一种绘制这种方式的方法是绘制差异本身,如下所示:


然而,这对于每个人都是第一次看到这样的图表来说似乎很难理解,所以我想知道:是吗?任何其他方式你可以想象两个分布函数之间的区别?

解决方案

我想也许这可能是一个选项,你的两个主张,同时扩大差异,使他们可见。



接下来是试图用ggplot2做到这一点。其实这比我最初想象的要多一点,我对结果绝对不满意;但也许它有帮助。

  library(ggplot2)
library(dplyr)

##函数,其复制默认的ggplot2颜色
##从[1]中获取
gg_color_hue < - 函数(n){
hues = seq(15,375,length = n + 1)
hcl(h = hues,l = 65,c = 100)[1:n]
}

##设置示例数据
set.seed 1)
n < - 2000
x1 < - rlnorm(n,0,1)
x2 < - rlnorm(n,0,1.1)
df < bind_rows(data.frame(sample = 1,x = x1),data.frame(sample = 2,x = x2))%>%
mutate(sample = as.factor(sample))

##计算密度估计值
g1 < - ggplot(df,aes(x = x,group = sample,color = sample))+
geom_density(data = df)+ xlim (0,10)
gg1 < - ggplot_build(g1)

##使用这些估计值(在相同的x坐标下可用!)计算
##计算差值。
##由[2]
x < - gg1 $ data [[1]] $ x [gg1 $ data [[1]] $ group == 1]
y1 < - gg1 $ data [[1]] $ y [gg1 $ data [[1]] $ group == 1]
y2 < - gg1 $ data [[1]] $ y [gg1 $ data [[ 1]] $ group == 2]
df2 < - data.frame(x = x,ymin = pmin(y1,y2),ymax = pmax(y1,y2),
side =( y1 g2 < - ggplot(df2)+
geom_ribbon(aes(x = x,ymin = ymin,ymax = ymax,fill = side,alpha = 0.5))+
geom_line(aes(x = x,y = 5 * abs(ydiff),color = side))+
geom_area(aes(x = x,y = 5 * abs(ydiff ),fill = side,alpha = 0.4))
g3 < - g2 +
geom_density(data = df,size = 1,aes(x = x,group = sample,color = sample)) +
xlim(0,10)+
指南(alpha = FALSE,color = FALSE)+
ylab(Curves:density \\\
shaded area:5 * difference of density) +
scale_fill_manual(name =samples,labels = 1:2,values = gg_color_hue(2))+
scale_colour_manual(limits = list(1,2,FALSE,TRUE),values = rep (gg_color_hue(2),2))

print(g3)



来源:


I try to visualise the difference between two histograms of distribution functions such as the difference in following two curves :

When the difference is big, you could just plot two curves on top of each other and fill the difference as denoted above, though when the difference becomes very small, this is cumbersome. Another way to plot this, is plotting the difference itself as follows :

However, this seems very hard to read for everyone seeing such a graph for the first time, so i was wondering: is there any other way you can visualise the difference between two distribution functions ?

解决方案

I thought that maybe it might be an option to simply combine your two propositions, while scaling up the differences to make them visible.

What follows is an attempt to do this with ggplot2. Actually it was quite a bit more involved to do this than I initially thought, and I'm definitely not a hundred percent satisfied with the result; but maybe it helps nevertheless. Comments and improvements very welcome.

library(ggplot2)
library(dplyr)

## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
  hues = seq(15, 375, length=n+1)
  hcl(h=hues, l=65, c=100)[1:n]
}

## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
  mutate(sample = as.factor(sample))

## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
  geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)

## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2), 
                  side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
   geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
   geom_line(aes(x = x, y = 5 * abs(ydiff), colour = side)) +
   geom_area(aes(x = x, y = 5 * abs(ydiff), fill = side, alpha = 0.4))
g3 <- g2 + 
   geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
   xlim(0, 10) +
   guides(alpha = FALSE, colour = FALSE) +
   ylab("Curves: density\n Shaded area: 5 * difference of densities") +
   scale_fill_manual(name = "samples", labels = 1:2, values = gg_color_hue(2)) +
   scale_colour_manual(limits = list(1, 2, FALSE, TRUE), values = rep(gg_color_hue(2), 2))

print(g3)

Sources: SO answer 1, SO answer 2


As suggested by @Gregor in the comments, here's a version that does two separate plots below eachother but sharing the same x axis scaling. At least the legends should obviously be tweaked.

library(ggplot2)
library(dplyr)
library(grid)

## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
  hues = seq(15, 375, length=n+1)
  hcl(h=hues, l=65, c=100)[1:n]
}

## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
  mutate(sample = as.factor(sample))

## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
  geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)

## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2), 
                  side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
   geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
   geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
  xlim(0, 10) +
  guides(alpha = FALSE, fill = FALSE)
g3 <- ggplot(df2) +
   geom_line(aes(x = x, y = abs(ydiff), colour = side)) +
   geom_area(aes(x = x, y = abs(ydiff), fill = side, alpha = 0.4)) +
   guides(alpha = FALSE, fill = FALSE)
## See [3]
grid.draw(rbind(ggplotGrob(g2), ggplotGrob(g3), size="last"))

... or with abs(ydiff) replaced by ydiff in the construction of the second plot:

Source: SO answer 3

这篇关于如何可视化概率分布函数之间的差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆