在R ggplot2中,包含stat_ecdf()端点(0,0)和(1,1) [英] In R ggplot2, include stat_ecdf() endpoints (0,0) and (1,1)

查看:1217
本文介绍了在R ggplot2中,包含stat_ecdf()端点(0,0)和(1,1)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用 stat_ecdf()将累计成功绘制为由预测模型创建的排名分数的函数。

  #libraries 
require(ggplot2)
require(比例)

#重现性的假数据
set.seed(123)
n < - 200
df < - data.frame(model_score = rexp(n = n,rate = 1:n),
obs_set = sample c(training,validation),n,replace = TRUE))
df $ model_rank < - rank(df $ model_score)/ n
df $ target_outcome< - rbinom(n, 1,1-df $ model_rank)

#使用stat_ecdf()
ggplot(子集(df,target_outcome == 1),aes(x = model_rank))+
stat_ecdf(aes(color = obs_set),size = 1)+
scale_x_continuous(limits = c(0,1),labels = percent,breaks = seq(0,1,.1))+
xlab(Model Percentile)+ ylab(目标结果百分比)+
scale_y_continuous(limits = c(0,1),labels = percent)+
geom_segment(aes(x = 0,y = 0,xend = 1,yend = 1),
color =gray,l +
ggtitle(增益图表)



所有我想要的do强制ECDF在(0,0)处开始并在(1,1)处结束,以便在曲线的开始或结束处没有间隙。如果可能的话,我希望在 ggplot2 的语法内完成它,但我会找出一个聪明的解决方法。



<@> @Henrik这不是这个问题,因为我已经用 scale_x _ _y_continuous()定义了我的限制, code> expand_limits()不会执行任何操作。它不是PLOT的起源,而是需要修正的stat_ecdf()的终结点。

解决方案

不幸的是,定义 stat_ecdf 在这里没有摆动空间;它会在内部确定终点。



有一些先进的解决方案。使用最新版本的ggplot2( devtools :: install_github(hadley / ggplot2)),可扩展性得到了改善,可以覆盖此行为,但不是没有一些样板。

  stat_ecdf2 < -  function(mapping = NULL,data = NULL,geom =step, 
position =identity,n = NULL,show.legend = NA,
inherit.aes = TRUE,minval = NULL,maxval = NULL,...){
layer(
data = data,
mapping = mapping,
stat = StatEcdf2,
geom = geom,
position = position,
show.legend = show.legend ,
inherit.aes = inherit.aes,
stat_params = list(n = n,minval = minval,maxval = maxval),
params = list(...)




StatEcdf2 < - ggproto(StatEcdf2,StatEcdf,
calculate = function(data,scale,n = NULL,minval = NULL, maxval = NULL,...){
df< - StatEcdf $ calculate(data,scales,n, ...)
if(!is.null(minval)){df $ x [1]< - minval}
if(!is.null(maxval)){df $ x [length (df $ x)] < - maxval}
df
}

现在, stat_ecdf2 的行为与 stat_ecdf 相同,但是有一个可选的 minval maxval 参数。所以这将做到这一点:

  ggplot(subset(df,target_outcome == 1),aes(x = model_rank)) + b $ b stat_ecdf2(aes(color = obs_set),size = 1,minval = 0,maxval = 1)+ 
scale_x_continuous(limits = c(0,1),labels = percent,breaks = seq 0,1,.1))+
xlab(模型百分比)+ ylab(目标结果百分比)+
scale_y_continuous(限制= c(0,1),labels =百分比) +
geom_segment(aes(x = 0,y = 0,xend = 1,yend = 1),
color =gray,linetype =longdash,size = 1)+
ggtitle(增益图表)

这里的重要警告是我不知道当前的可扩展性模型将在未来得到支持;它在过去已经改变了好几次,使用ggproto的改变是最近的 - 就像2015年7月15日最近的那样。

另外,这给了我一个真正深入ggplot内部的机会,这是我一直想要做的事情。


I'm trying to use stat_ecdf() to plot cumulative successes as a function of a rank score created by a predictive model.

#libraries
require(ggplot2)
require(scales)

# fake data for reproducibility
set.seed(123)
n <- 200
df <- data.frame(model_score= rexp(n=n,rate=1:n),
                 obs_set= sample(c("training","validation"),n,replace=TRUE))
df$model_rank <- rank(df$model_score)/n
df$target_outcome <- rbinom(n,1,1-df$model_rank)

# Plot Gain Chart using stat_ecdf()
ggplot(subset(df,target_outcome==1),aes(x = model_rank)) + 
  stat_ecdf(aes(colour = obs_set), size=1) + 
  scale_x_continuous(limits=c(0,1), labels=percent,breaks=seq(0,1,.1)) +
  xlab("Model Percentile") + ylab("Percent of Target Outcome") +
  scale_y_continuous(limits=c(0,1), labels=percent) +
  geom_segment(aes(x=0,y=0,xend=1,yend=1), 
               colour = "gray", linetype="longdash", size=1) +
  ggtitle("Gain Chart")

All I want to do is force the ECDF to start at (0,0) and end at (1,1) so that there are no gaps at the beginning or end of the curve. If possible, I'd like to do it within the syntax of ggplot2, but I'd settle for a clever workaround.

@Henrik this is NOT a duplicate of this question, because I have already defined my limits with scale_x_ and _y_continuous(), and adding expand_limits() doesn't do anything. It is not the origin of the PLOT but the endpoints of the stat_ecdf() that need fixed.

解决方案

Unfortunately, the definition of stat_ecdf gives no wiggle room here; it determines the endpoints internally.

There is a somewhat advanced solution. With the latest version of ggplot2 (devtools::install_github("hadley/ggplot2")), the extensibility is improved, to the point where it is possible to override this behavior, but not without some boilerplate.

stat_ecdf2 <- function(mapping = NULL, data = NULL, geom = "step",
                      position = "identity", n = NULL, show.legend = NA,
                      inherit.aes = TRUE, minval=NULL, maxval=NULL,...) {
  layer(
    data = data,
    mapping = mapping,
    stat = StatEcdf2,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    stat_params = list(n = n, minval=minval,maxval=maxval),
    params = list(...)
  )
}


StatEcdf2 <- ggproto("StatEcdf2", StatEcdf,
  calculate = function(data, scales, n = NULL, minval=NULL, maxval=NULL, ...) {
    df <- StatEcdf$calculate(data, scales, n, ...)
    if (!is.null(minval)) { df$x[1] <- minval }
    if (!is.null(maxval)) { df$x[length(df$x)] <- maxval }
    df
  }
)

Now, stat_ecdf2 will behave the same as stat_ecdf, but with an optional minval and maxval parameter. So this will do the trick:

ggplot(subset(df,target_outcome==1),aes(x = model_rank)) +
  stat_ecdf2(aes(colour = obs_set), size=1, minval=0, maxval=1) +
  scale_x_continuous(limits=c(0,1), labels=percent,breaks=seq(0,1,.1)) +
  xlab("Model Percentile") + ylab("Percent of Target Outcome") +
  scale_y_continuous(limits=c(0,1), labels=percent) +
  geom_segment(aes(x=0,y=0,xend=1,yend=1),
               colour = "gray", linetype="longdash", size=1) +
  ggtitle("Gain Chart")

The big caveat here is that I don't know if the current extensibility model will be supported in the future; it has changed several times in the past, and the change to use "ggproto" is recent -- like July 15th 2015 recent.

As a plus, this gave me a chance to really dig into ggplot's internals, which is something that I've been meaning to do for a while.

这篇关于在R ggplot2中,包含stat_ecdf()端点(0,0)和(1,1)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆