如何绘制两个ggplot密度分布之间的差异? [英] How to plot the difference between two ggplot density distributions?

查看:83
本文介绍了如何绘制两个ggplot密度分布之间的差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用ggplot2来说明两个相似密度分布之间的差异.这是我拥有的数据类型的一个玩具示例:

I would like to use ggplot2 to illustrate the difference between two similar density distributions. Here is a toy example of the type of data I have:

library(ggplot2)

# Make toy data
n_sp  <- 100000
n_dup <- 50000
D <- data.frame( 
    event=c(rep("sp", n_sp), rep("dup", n_dup) ), 
    q=c(rnorm(n_sp, mean=2.0), rnorm(n_dup, mean=2.1)) 
)

# Standard density plot
ggplot( D, aes( x=q, y=..density.., col=event ) ) +
    geom_freqpoly()

我不是像上面那样分别绘制每个类别(dupsp)的密度,而是如何绘制一条线来显示这些分布之间的差异?

Rather than separately plot the density for each category ( dup and sp ) as above, how could I plot a single line that shows the difference between these distributions?

在上面的玩具示例中,如果我从sp密度分布中减去dup密度分布,则结果线在图的左侧将大于零(因为存在大量的值),并在右边小于0(因为有大量的dup值).并不是说dupsp类型的观察值可能有所不同.

In the toy example above, if I subtracted the dup density distribution from the sp density distribution, the resulting line would be above zero on the left side of the plot (since there is an abundance of smaller sp values) and below 0 on the right (since there is an abundance of larger dup values). Not that there may be a different number of observations of type dup and sp.

更笼统地说-显示相似密度分布之间差异的最佳方法是什么?

More generally - what is the best way to show differences between similar density distributions?

推荐答案

在ggplot中可能有这样做的方法,但通常最简单的方法是预先进行计算.在这种情况下,请在同一范围内的q的每个子集上调用density,然后减去y值.使用dplyr(如果需要,可以转换为基R或data.table),

There may be a way to do this within ggplot, but frequently it's easiest to do the calculations beforehand. In this case, call density on each subset of q over the same range, then subtract the y values. Using dplyr (translate to base R or data.table if you wish),

library(dplyr)
library(ggplot2)

D %>% group_by(event) %>% 
    # calculate densities for each group over same range; store in list column
    summarise(d = list(density(q, from = min(.$q), to = max(.$q)))) %>% 
    # make a new data.frame from two density objects
    do(data.frame(x = .$d[[1]]$x,    # grab one set of x values (which are the same)
                  y = .$d[[1]]$y - .$d[[2]]$y)) %>%    # and subtract the y values
    ggplot(aes(x, y)) +    # now plot
    geom_line()

这篇关于如何绘制两个ggplot密度分布之间的差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆