如何在ggplot中遮蔽部分密度曲线(不含y轴数据) [英] How to shade part of a density curve in ggplot (with no y axis data)

查看:216
本文介绍了如何在ggplot中遮蔽部分密度曲线(不含y轴数据)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用一组1000个之间的随机数在R中创建一个密度曲线,并将小于或等于某个值的部分遮蔽。有很多解决方案涉及 geom_area geom_ribbon ,但它们都需要一个 yval ,我没有(它只是1000个数字的向量)。关于如何做到这一点的任何想法?

其他两个相关问题:


  1. 是否可以为累积密度函数做同样的事情(我目前使用 stat_ecdf 来产生一个),或者完全遮蔽它?

  2. 是否有任何方法可以编辑 geom_vline ,所以它只会上升到密度曲线的高度,而不是整个y轴?

代码:( geom_area 是一个编辑失败的代码,我发现了。我手动设置了 ymax ,我只是列出了占据整个图的列,而不是曲线下方的区域)

($ 100)
$ b amount_spent < - rnorm b $ b rand1 < - runif(1,0,1000)
amount_spent1 $ pdf< - dnorm(amount_spent1 $ amount_spent)

mean1< - mean(amount_spent1 $ amount_spent)

#密度/钟形曲线
ggplot(amount_spent1 ,aes(amount_spent))+
geom_density(size = 1.05,color =gray64,alpha = .5,fill =gray77)+
geom_vline(xintercept = mean1,alpha = .7, linetype =dashed,size = 1.1,color =cadetblue4)+
geom_vline(xintercept = rand1,alpha = .7,linetype =dashed,size = 1.1,color =red3)+
geom_area(mapping = aes(ifelse(amount_spent1 $ amount_spent> rand1,amount_spent1 $ amount_spent,0)),ymin = 0,ymax = .03,fill =red,alpha = .3)+
ylab()+
xlab (以百万美元计))+
scale_x_continuous(休息= seq(0,1000,100))


解决方案

有几个问题显示了这一点... 这里这里,但他们计算密度之前的密度。



这是另一种方式,比要求的更复杂,允许 ggplot 为您做一些计算。

 #您的数据
set.seed(100)
amount_spent1< - data.frame(amount_spent = rnorm(1000,500,150))

mean1< - mean(amount_spent1 $ amount_spent )
rand1 < - runif(1,0,1000)

基本密度图

  p < -  ggplot(amount_spent1,aes(amount_spent))+ 
geom_density(fill =gray)+
geom_vline(xintercept = mean1)

您可以使用 x 和 y > ggplot_build 。线性插值用于获得 y x = rand1

 #subset region和plot 
d < - ggplot_build(p)$ data [[1]]

p < - p + geom_area (data = subset(d,x> rand1),aes(x = x,y = y),fill =red)+
geom_segment(x = rand1,xend = rand1,
y = 0,yend = approx(x = d $ x,y = d $ y,xout = rand1)$ y,
color =blue,size = 3)


I'm trying to create a density curve in R using a set of random numbers between 1000, and shade the part that is less than or equal to a certain value. There are a lot of solutions out there involving geom_area or geom_ribbon, but they all require a yval, which I don't have (it's just a vector of 1000 numbers). Any ideas on how I could do this?

Two other related questions:

  1. Is it possible to do the same thing for a cumulative density function (I'm currently using stat_ecdf to generate one), or shade it at all?
  2. Is there any way to edit geom_vline so it will only go up to the height of the density curve, rather than the whole y axis?

Code: (the geom_area is a failed attempt to edit some code I found. If I set ymax manually, I just get a column taking up the whole plot, instead of just the area under the curve)

set.seed(100)

amount_spent <- rnorm(1000,500,150)
amount_spent1<- data.frame(amount_spent)
rand1 <- runif(1,0,1000)
amount_spent1$pdf <- dnorm(amount_spent1$amount_spent)

mean1 <- mean(amount_spent1$amount_spent)

#density/bell curve
ggplot(amount_spent1,aes(amount_spent)) +
   geom_density( size=1.05, color="gray64", alpha=.5, fill="gray77") +
   geom_vline(xintercept=mean1, alpha=.7, linetype="dashed", size=1.1, color="cadetblue4")+
   geom_vline(xintercept=rand1, alpha=.7, linetype="dashed",size=1.1, color="red3")+
   geom_area(mapping=aes(ifelse(amount_spent1$amount_spent > rand1,amount_spent1$amount_spent,0)), ymin=0, ymax=.03,fill="red",alpha=.3)+
   ylab("")+ 
   xlab("Amount spent on lobbying (in Millions USD)")+
   scale_x_continuous(breaks=seq(0,1000,100))

解决方案

There are a couple of questions that show this ... here and here, but they calculate the density prior to plotting.

This is another way, more complicated than required im sure, that allows ggplot to do some of the calculations for you.

# Your data
set.seed(100)
amount_spent1 <- data.frame(amount_spent=rnorm(1000, 500, 150))

mean1 <- mean(amount_spent1$amount_spent)
rand1 <- runif(1,0,1000)

Basic density plot

p <- ggplot(amount_spent1, aes(amount_spent)) +
          geom_density(fill="grey") +
          geom_vline(xintercept=mean1) 

You can extract the x and y positions for the area to shade from the plot object using ggplot_build. Linear interpolation was used to get the y value at x=rand1

# subset region and plot
d <- ggplot_build(p)$data[[1]]

p <- p + geom_area(data = subset(d, x > rand1), aes(x=x, y=y), fill="red") +
          geom_segment(x=rand1, xend=rand1, 
                       y=0, yend=approx(x = d$x, y = d$y, xout = rand1)$y,
                       colour="blue", size=3)

这篇关于如何在ggplot中遮蔽部分密度曲线(不含y轴数据)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆