ggplot2:geom_ribbon,其alpha依赖于每个x沿着y轴的数据密度 [英] ggplot2: geom_ribbon with alpha dependent on data density along y-axis for each x

查看:231
本文介绍了ggplot2:geom_ribbon,其alpha依赖于每个x沿着y轴的数据密度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ggplot2有没有一种方法可以根据点的密度生成一个geom_ribbon(或其他基于区域的geom)并且具有不同的alpha值?下面的代码产生50个嘈杂的正弦波,每个样本具有随机的x值。我不想画出每一个点,因为我可能想要一千个或更多的重复样本,所以我想总结所有这些点。



一个简单的方法会要画一个覆盖95%分位数的geom_ribbon。但是,首先这并不容易计算,因为每个重采样的x值不同;通常情况下,您可以计算每个100 x点的点分位数。



相反,我希望色带覆盖样本所在的整个区域,连续的阿尔法梯度,即色带在实际线附近的中间是最暗的,在异常点非常轻。这可能在ggplot2中吗?

  library(ggplot2)

num_points = 100
num_samples = 50

x = seq(0,4 * pi,length.out = num_points)

sim < - lapply(1:num_samples,function(f){
x = runif(num_points,0,4 * pi)
y = sin(x)+ rnorm(num_points,0,0.4)
data.frame(x = x,y = y)
)}

sim.df< - do.call(rbind,sim)
actual = data.frame(x = x,y = sin(x))

ggplot(sim.df,aes(x = x,y = y))+
geom_point(alpha = 0.7)+
geom_line(data = actual,color ='blue',size = 1.5)

解决方案

一种选择是使用分位数回归来获得每个分位数在每个x值处的y值,然后使用 geom_ribbon

 图书馆(splines)
library(quantreg)
library(reshape2)
library(dplyr)




  1. 设置密度带的分位数:

      nq = 50# 
    qq = seq(0,1,length.out = nq)


  2. 运行每个分位数的分位数回归。我已经使用了灵活的样条函数来很好地适应正弦函数:

    $ $ $ $ $ $ $ $ $ $>数据= sim.df,tau = qq)


  3. 创建数据框用于由geom_ribbon使用来绘制密度分位数。

    使用 predict 创建回归分位数预测的数据框:

    p>

      xvals = seq(min(sim.df $ x),max(sim.df $ x),length.out = 100) 
    rqs = data.frame(x = xvals,predict(m1,newdata = data.frame(x = xvals)))
    names(rqs)= c(x,paste0(p ,100 * qq))

    重塑数据,以便每个分位数的预测用作 ymax 为一个分位数,并且 ymin 为下一个分位数连续分配(除了第一个分位数仅作为第一个 ymin ,最后一个分位数只作为最后一个 ymax )。将数据放在长格式中,以便我们可以在ggplot中按分位数进行分组:

      dat1 = rqs [,-length(rqs) ] 
    名称(dat1)[ - 1] = paste0(名称(dat1)[ - 1])$ ​​b $ b dat2 = rqs [,-2]
    名称(dat2)[ - 1] = paste0(names(dat1)[ - 1])$ ​​b
    $ b dat1 = melt(dat1,id.var =x)
    名称(dat1)= c(x,group ,min)
    dat2 = melt(dat2,id.var =x)
    名称(dat2)= c(x,group1,max)

    dat = bind_cols(dat1,dat2)


  4. 现在创建绘图。我们将分位数映射到 alpha 唯美性,然后使用 scale_alpha_manual 设置分位数更高的alpha值到0.5和更低的分位数接近0和1:

    $ g $ p $ g $ g $ g $ g $ b $ geom_point(data = sim。 (x,y)= min,ymax = max,group = group),alpha = group),
    fill =blue,lwd = 0,show.legend = FALSE)+
    theme_bw()+
    scale_alpha_manual(values = c(seq(0.05, 0.9,length.out = floor(0.5 * length(qq))),
    seq(0.9,0.05,length.out = floor(0.5 * length(qq)))))




下面是另一个例子,但数据有不同的标准差:

  sim< ;  -  lapply(1:n um_samples函数(f){
x = runif(num_points,0,4 * pi)
y = sin(x)+ rnorm(num_points,0,abs(0.7 * cos(x))+0.1)
data.frame(x = x,y = y)
})

sim.df< - do.call(rbind,sim)

现在只需运行我们之前创建的所有代码即可获得该图:




Is there a way in ggplot2 to produce a geom_ribbon (or other area based geom) with a varying alpha based on the density of points?

The following code produces 50 noisy sine waves, with random x-values for each sample. I don't want to draw every single point as I might want a thousand or more resamples, so I'd like to summarise all these points.

A simple method would be to draw a geom_ribbon covering 95% quantiles. However, firstly this isn't that easy to calculate given the x-values aren't the same for each resample; normally you'd calculate the pointwise quantiles at each of the 100 x points.

Instead I'd like to have ribbon covering the entire area where samples are located, with a continuous alpha gradient, i.e. the ribbon would be darkest in the middle near the actual line and very light at the outlier points. Is this possible in ggplot2?

library(ggplot2)

num_points = 100
num_samples = 50

x = seq(0, 4*pi, length.out=num_points)

sim <- lapply(1:num_samples, function(f) {
    x = runif(num_points, 0, 4*pi)
    y = sin(x) + rnorm(num_points, 0, 0.4)
    data.frame(x=x, y=y)
})

sim.df <- do.call(rbind, sim)
actual = data.frame(x=x, y=sin(x))

ggplot(sim.df, aes(x=x, y=y)) +
    geom_point(alpha=0.7) +
    geom_line(data=actual, colour='blue', size=1.5) 

解决方案

One option is to use quantile regression to get the y-values for each quantile at each x-value and then plot those using geom_ribbon.

library(splines)
library(quantreg)
library(reshape2)
library(dplyr)

  1. Set quantiles for density ribbons:

    nq = 50 # Numbre of quantiles
    qq = seq(0,1, length.out=nq)
    

  2. Run the quantile regression for each quantile. I've used a flexible spline function to get a good fit to the sine function:

    m1 = rq(y ~ ns(x,10), data=sim.df, tau=qq)
    

  3. Create data frame for use by geom_ribbon to plot density quantiles.

    Create a data frame of regression quantile predictions using predict:

    xvals = seq(min(sim.df$x), max(sim.df$x), length.out=100)
    rqs = data.frame(x=xvals, predict(m1, newdata=data.frame(x=xvals)))
    names(rqs) = c("x", paste0("p",100*qq))
    

    Reshape the data so that the predictions for each quantile serve as the ymax for one quantile and the ymin for the next quantile in succession (with the exception that the first quantile serves only once as the first ymin and the last quantile serves only once as the last ymax). Put the data in long format so that we can group by quantile in ggplot:

    dat1 = rqs[, -length(rqs)]
    names(dat1)[-1] = paste0(names(dat1)[-1])
    dat2 = rqs[, -2]
    names(dat2)[-1] = paste0(names(dat1)[-1])
    
    dat1 = melt(dat1, id.var="x")
    names(dat1) = c("x","group","min")
    dat2 = melt(dat2, id.var="x")
    names(dat2) = c("x","group1","max")
    
    dat = bind_cols(dat1, dat2)
    

  4. Now create the plot. We map the quantiles to the alpha aesthetic, and then use scale_alpha_manual to set the alpha values to be higher for quantiles closer to 0.5 and lower for quantiles closer to 0 and 1:

    ggplot() +
      geom_point(data=sim.df, aes(x,y), alpha=0.1, size=0.5, colour="red") +
      geom_ribbon(data=dat, aes(x=x, ymin=min, ymax=max, group=group, alpha=group), 
              fill="blue", lwd=0, show.legend=FALSE) +
      theme_bw() +
      scale_alpha_manual(values=c(seq(0.05,0.9,length.out=floor(0.5*length(qq))),
                                  seq(0.9,0.05,length.out=floor(0.5*length(qq)))))
    

Here's another example, but with data that has a varying standard deviation:

sim <- lapply(1:num_samples, function(f) {
  x = runif(num_points, 0, 4*pi)
  y = sin(x) + rnorm(num_points, 0, abs(0.7*cos(x))+0.1)
  data.frame(x=x, y=y)
})

sim.df <- do.call(rbind, sim)

Now just run all of the code we created earlier to get this plot:

这篇关于ggplot2:geom_ribbon,其alpha依赖于每个x沿着y轴的数据密度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆