使用optimize()查找R中曲线下面95%区域的最短间隔 [英] Using optimize() to find the shortest interval that takes 95% area under a curve in R

查看:217
本文介绍了使用optimize()查找R中曲线下面95%区域的最短间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一条曲线,其Y值是由我的 R函数在下面产生的 整齐注释 )。如果你运行我的整个R代码,你会看到我的曲线(但请记住,这是一个函数,所以如果我改变参数值,我可以得到一个不同的曲线):



问题:



显然,假设多个间隔,这将覆盖/占据此曲线下总面积的95%。但是,如何使用 optimize(),我怎样才能找到这95个区间内的 SHORTEST(以x值为单位)?那么这个最短95%区间的两端对应的x值是多少?



注意: 像我这样的单峰曲线的最短间隔的想法是有道理的。实际上,最短的一个将会是高度(y值)较大的中间的那个,所以那么x值不需要如预期的那么大以覆盖/取得95

这里是我的R代码(请运行整个代码):



  ppp < - 函数(f,N,df1,df2,petasq,alpha,beta){

pp< - 函数(petasq)dbeta(petasq,alpha,beta)
ll < - 函数(petasq)df(f,df1,df2,(petasq * N)/(1 - petasq))
$ (x)pp(x)* ll(x),0,1)[[1]]

po < - 函数(x)pp( x)* ll(x)/ marg
return(po(petasq))

}
## @@@ R功能结束。

#现在我用上面的函数来得到我的图的y值:

petasq< - seq(0,1,by = .0001)##这些是我的图的X值
f< - 30#一个函数需要的参数
df1< - 3#一个函数需要的参数
df2< - 108# b $ b N < - 120#一个函数需要的参数
alpha = 5#一个函数需要的参数
beta = 4#一个函数需要的参数


# #现在使用ppp()函数来获得上述X值范围的Y值:
y.values< - ppp(f,N,df1,df2,petasq,alpha,beta)

##最后将petasq(作为X值)与Y.values对比:
plot(petasq,y.values,ty =l,lwd = 3)


解决方案

根据您的修订问题,我找到了最小化最短距离以x值为单位)在LEFT和RIGHT之间:

  ppp < -  function(peta平方,f,N,df1,df2,alpha,beta){

pp < - 函数(petasq)dbeta(petasq,alpha,beta)
ll < - 函数(petasq )(函数(x)pp(x)* ll(x),0)df(f,df1,df2,(petasq * N)/(1-petasq))

marg- ,1)[[1]]

po < - 函数(x)pp(x)* ll(x)/ marg
return(po(petasq))
}

petasq< - seq(0,1,by = .0001)##这些是我绘图的X值
f< - 30#函数需要的参数
df1< - 3#函数需要参数
df2< - 108#函数需要参数
N< - 120#函数需要参数
alpha = 5#需要的函数参数
beta = 4#函数需要的参数

optim_func< - function(x_left){
int_function< - function(petasq){
ppp(petasq ,f = f,N = N,df1 = df1,df2 = df2,alpha = alpha,beta = beta)
}

#对于每个LEFT值,找到相应的RIGHT值给出95%的面积。

find_95_right< - function(x_right){
(0.95 - integrate(int_function,lower = x_left,upper = x_right,subdivisions = 10000)$ value)^ 2
}
x_right_obj< - 优化(f = find_95_right,interval = c(0.5,1))
if(x_right_obj $ objective> .Machine $ double.eps ^ 0.25)return(100)

#返回左右距离
返回(x_right_obj $最小 - x_left)
}

#缩小左右距离
x_left < - optimize(f = optim_func,interval = c(0.30,0.40))$ minimum
find_95_right < - function(x_right){
(0.95 - integrate(int_function,lower = x_left,upper = x_right,subdivisions = 10000)$ value)^ 2
}
int_function < - 函数(petasq){
ppp(petasq,f = f,N = N,df1 = df1, (f = find_95_right,interval = c(0.5,1))$ minimum

请参阅代码中的注释。希望这终于可以满足你的问题:)结果:

 > x_right 
[1] 0.5409488
> x_left
[1] 0.3201584

另外,您可以绘制LEFT和RIGHT之间的距离为左边界的函数:

  left_x_values < -  seq(0.30,0.335,0.0001)
DISTANCE< - sapply(left_x_values,optim_func)

plot(left_x_values,DISTANCE,type =l)


Background:

I have a curve whose Y-values are produced by my small R function below (neatly annotated). If you run my entire R code, you see my curve (but remember, it's a function so if I changed the argument values, I could get a different curve):

Question:

Obviously, one can determine/assume many intervals that would cover/take 95% of the total area under this curve. But using, optimize(), how can I find the SHORTEST (in x-value units) of these many possible 95% intervals? What then would be the corresponding x-values for the the two ends of this shortest 95% interval?

Note: The idea of shortest interval for a uni-modal curve like mine makes sense. In reality, the shortest one would be the one that tends to be toward the middle where the height (y-value) is larger, so then x-value doesn't need to be so large for the intended interval to cover/take 95% of the total area under the curve.

Here is my R code (please run the entire code):

ppp <- function(f, N, df1, df2, petasq, alpha, beta) {

 pp <- function(petasq) dbeta(petasq, alpha, beta)
 ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )

 marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]

po <- function(x) pp(x)*ll(x) / marg
return(po(petasq) )

}
## @@@ END OF MY R FUNCTION.

# Now I use my function above to get the y-values for my plot:

petasq  <- seq(0, 1, by = .0001) ## These are X-values for my plot
f  <- 30       # a function needed argument
df1 <- 3       # a function needed argument
df2 <- 108     # a function needed argument
N  <- 120      # a function needed argument
alpha = 5      # a function needed argument
beta = 4       # a function needed argument


## Now use the ppp() function to get the Y-values for the X-value range above:
y.values <- ppp(f, N, df1, df2, petasq, alpha, beta)

## Finally plot petasq (as X-values) against the Y.values:
plot(petasq, y.values, ty="l", lwd = 3 )

解决方案

Based on your revised question, I found the optimization that minimizes the SHORTEST distance (in x-value units) between LEFT and RIGHT boundaries:

ppp <- function(petasq, f, N, df1, df2, alpha, beta) {

 pp <- function(petasq) dbeta(petasq, alpha, beta)
 ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )

 marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]

po <- function(x) pp(x)*ll(x) / marg
return(po(petasq) )
}

petasq  <- seq(0, 1, by = .0001) ## These are X-values for my plot
f  <- 30       # a function needed argument
df1 <- 3       # a function needed argument
df2 <- 108     # a function needed argument
N  <- 120      # a function needed argument
alpha = 5      # a function needed argument
beta = 4       # a function needed argument

optim_func <- function(x_left) {
    int_function <- function(petasq) {
        ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
    }

    # For every LEFT value, find the corresponding RIGHT value that gives 95% area.  

    find_95_right <- function(x_right) {
        (0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
    }
    x_right_obj <- optimize(f=find_95_right, interval=c(0.5,1))
    if(x_right_obj$objective > .Machine$double.eps^0.25) return(100)

    #Return the DISTANCE BETWEEN LEFT AND RIGHT
    return(x_right_obj$minimum - x_left)
}

#MINIMIZE THE DISTANCE BETWEEN LEFT AND RIGHT
x_left <- optimize(f=optim_func, interval=c(0.30,0.40))$minimum
find_95_right <- function(x_right) {
    (0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
}
    int_function <- function(petasq) {
        ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
    }
x_right <- optimize(f=find_95_right, interval=c(0.5,1))$minimum

See the comments in the code. Hopefully this finally satisfies your question :) Results:

> x_right
[1] 0.5409488
> x_left
[1] 0.3201584

Also, you can plot the distance between LEFT and RIGHT as a function of the left boundary:

left_x_values <- seq(0.30, 0.335, 0.0001)
DISTANCE <- sapply(left_x_values, optim_func)

plot(left_x_values, DISTANCE, type="l")

这篇关于使用optimize()查找R中曲线下面95%区域的最短间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆