在ggplot2中使用geom_stat/geom_smooth时在置信区间上下查找点 [英] Find points over and under the confidence interval when using geom_stat / geom_smooth in ggplot2

查看:42
本文介绍了在ggplot2中使用geom_stat/geom_smooth时在置信区间上下查找点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个散点图,我想知道如何找到置信区间线上方和下方的基因?

<小时>

可重现的示例:

库(ggplot2)#虚拟数据df <- mtcars[,c("mpg","cyl")]#阴谋ggplot(df,aes(mpg,cyl)) +geom_point() +geom_smooth()

解决方案

我不得不深入研究 github 存储库,但我终于明白了.为了做到这一点,您需要知道如何

附言对于对上下边界感兴趣的任何人,它们都是这样创建的(推测:虽然阴影区域可能是用 geom_ribbon 创建的 - 或类似的东西 - 这使它们更圆更漂亮):

#上界ggplot(df,aes(mpg,cyl)) +geom_point(aes(colour=factor(outer))) +geom_smooth() +geom_line(data=df2, aes(mpg, fit + se.fit, group=1), colour='red')#下限ggplot(df,aes(mpg,cyl)) +geom_point(aes(colour=factor(outer))) +geom_smooth() +geom_line(data=df2, aes(mpg, fit - se.fit, group=1), colour='red')

I have a scatter plot,I want to know how can I find the genes above and below the confidence interval lines?


EDIT: Reproducible example:

library(ggplot2)
#dummy data
df <- mtcars[,c("mpg","cyl")]

#plot
ggplot(df,aes(mpg,cyl)) +
  geom_point() +
  geom_smooth()

解决方案

I had to take a deep dive into the github repo but I finally got it. In order to do this you need to know how stat_smooth works. In this specific case the loess function is called to do the smoothing (the different smoothing functions can be constructed using the same process as below):

So, using loess on this occasion we would do:

#data
df <- mtcars[,c("mpg","cyl"), with=FALSE]
#run loess model
cars.lo <- loess(cyl ~ mpg, df)

Then I had to read this in order to see how the predictions are made internally in stat_smooth. Apparently hadley uses the predictdf function (which is not exported to the namespace) as follows for our case:

predictdf.loess <- function(model, xseq, se, level) {
  pred <- stats::predict(model, newdata = data.frame(x = xseq), se = se)

  if (se) {
    y = pred$fit
    ci <- pred$se.fit * stats::qt(level / 2 + .5, pred$df)
    ymin = y - ci
    ymax = y + ci
    data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit)
  } else {
    data.frame(x = xseq, y = as.vector(pred))
  }
}

After reading the above I was able to create my own data.frame of the predictions using:

#get the predictions i.e. the fit and se.fit vectors
pred <- predict(cars.lo, se=TRUE)
#create a data.frame from those
df2 <- data.frame(mpg=df$mpg, fit=pred$fit, se.fit=pred$se.fit * qt(0.95 / 2 + .5, pred$df))

Looking at predictdf.loess we can see that the upper boundary of the confidence interval is created as pred$fit + pred$se.fit * qt(0.95 / 2 + .5, pred$df) and the lower boundary as pred$fit - pred$se.fit * qt(0.95 / 2 + .5, pred$df).

Using those we can create a flag for the points over or below those boundaries:

#make the flag
outerpoints <- +(df$cyl > df2$fit + df2$se.fit |  df$cyl < df2$fit - df2$se.fit)
#add flag to original data frame
df$outer <- outerpoints

The df$outer column is probably what the OP is looking for (it takes the value of 1 if it is outside the boundaries or 0 otherwise) but just for the sake of it I am plotting it below.

Notice the + function above is only used here to convert the logical flag into a numeric.

Now if we plot as this:

ggplot(df,aes(mpg,cyl)) +
  geom_point(aes(colour=factor(outer))) +
  geom_smooth() 

We can actually see the points inside and outside the confidence interval.

Output:

P.S. For anyone who is interested in the upper and lower boundaries, they are created like this (speculation: although the shaded areas are probably created with geom_ribbon - or something similar - which makes them more round and pretty):

#upper boundary
ggplot(df,aes(mpg,cyl)) +
   geom_point(aes(colour=factor(outer))) +
   geom_smooth() +
   geom_line(data=df2, aes(mpg , fit + se.fit , group=1), colour='red')

#lower boundary
ggplot(df,aes(mpg,cyl)) +
   geom_point(aes(colour=factor(outer))) +
   geom_smooth() +
   geom_line(data=df2, aes(mpg , fit - se.fit , group=1), colour='red')

这篇关于在ggplot2中使用geom_stat/geom_smooth时在置信区间上下查找点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆