使用optimize()查找R中曲线下面95%区域的最短间隔 [英] Using optimize() to find the shortest interval that takes 95% area under a curve in R
问题描述
问题:
显然,假设多个间隔,这将覆盖/占据此曲线下总面积的95%。但是,如何使用 optimize()
,我怎样才能找到这95个区间内的 SHORTEST(以x值为单位)?那么这个最短95%区间的两端对应的x值是多少?
注意: 像我这样的单峰曲线的最短间隔的想法是有道理的。实际上,最短的一个将会是高度(y值)较大的中间的那个,所以那么x值不需要如预期的那么大以覆盖/取得95
这里是我的R代码(请运行整个代码):
ppp < - 函数(f,N,df1,df2,petasq,alpha,beta){
pp< - 函数(petasq)dbeta(petasq,alpha,beta)
ll < - 函数(petasq)df(f,df1,df2,(petasq * N)/(1 - petasq))
$ (x)pp(x)* ll(x),0,1)[[1]]
po < - 函数(x)pp( x)* ll(x)/ marg
return(po(petasq))
}
## @@@ R功能结束。
#现在我用上面的函数来得到我的图的y值:
petasq< - seq(0,1,by = .0001)##这些是我的图的X值
f< - 30#一个函数需要的参数
df1< - 3#一个函数需要的参数
df2< - 108# b $ b N < - 120#一个函数需要的参数
alpha = 5#一个函数需要的参数
beta = 4#一个函数需要的参数
# #现在使用ppp()函数来获得上述X值范围的Y值:
y.values< - ppp(f,N,df1,df2,petasq,alpha,beta)
##最后将petasq(作为X值)与Y.values对比:
plot(petasq,y.values,ty =l,lwd = 3)
根据您的修订问题,我找到了最小化最短距离以x值为单位)在LEFT和RIGHT之间:
ppp < - function(peta平方,f,N,df1,df2,alpha,beta){
pp < - 函数(petasq)dbeta(petasq,alpha,beta)
ll < - 函数(petasq )(函数(x)pp(x)* ll(x),0)df(f,df1,df2,(petasq * N)/(1-petasq))
marg- ,1)[[1]]
po < - 函数(x)pp(x)* ll(x)/ marg
return(po(petasq))
}
petasq< - seq(0,1,by = .0001)##这些是我绘图的X值
f< - 30#函数需要的参数
df1< - 3#函数需要参数
df2< - 108#函数需要参数
N< - 120#函数需要参数
alpha = 5#需要的函数参数
beta = 4#函数需要的参数
optim_func< - function(x_left){
int_function< - function(petasq){
ppp(petasq ,f = f,N = N,df1 = df1,df2 = df2,alpha = alpha,beta = beta)
}
#对于每个LEFT值,找到相应的RIGHT值给出95%的面积。
find_95_right< - function(x_right){
(0.95 - integrate(int_function,lower = x_left,upper = x_right,subdivisions = 10000)$ value)^ 2
}
x_right_obj< - 优化(f = find_95_right,interval = c(0.5,1))
if(x_right_obj $ objective> .Machine $ double.eps ^ 0.25)return(100)
#返回左右距离
返回(x_right_obj $最小 - x_left)
}
#缩小左右距离
x_left < - optimize(f = optim_func,interval = c(0.30,0.40))$ minimum
find_95_right < - function(x_right){
(0.95 - integrate(int_function,lower = x_left,upper = x_right,subdivisions = 10000)$ value)^ 2
}
int_function < - 函数(petasq){
ppp(petasq,f = f,N = N,df1 = df1, (f = find_95_right,interval = c(0.5,1))$ minimum
请参阅代码中的注释。希望这终于可以满足你的问题:)结果:
> x_right
[1] 0.5409488
> x_left
[1] 0.3201584
另外,您可以绘制LEFT和RIGHT之间的距离为左边界的函数:
left_x_values < - seq(0.30,0.335,0.0001)
DISTANCE< - sapply(left_x_values,optim_func)
plot(left_x_values,DISTANCE,type =l)
Background:
I have a curve whose Y-values are produced by my small R function below (neatly annotated). If you run my entire R code, you see my curve (but remember, it's a function so if I changed the argument values, I could get a different curve):
Question:
Obviously, one can determine/assume many intervals that would cover/take 95% of the total area under this curve. But using, optimize()
, how can I find the SHORTEST (in x-value units) of these many possible 95% intervals? What then would be the corresponding x-values for the the two ends of this shortest 95% interval?
Note: The idea of shortest interval for a uni-modal curve like mine makes sense. In reality, the shortest one would be the one that tends to be toward the middle where the height (y-value) is larger, so then x-value doesn't need to be so large for the intended interval to cover/take 95% of the total area under the curve.
Here is my R code (please run the entire code):
ppp <- function(f, N, df1, df2, petasq, alpha, beta) {
pp <- function(petasq) dbeta(petasq, alpha, beta)
ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )
marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]
po <- function(x) pp(x)*ll(x) / marg
return(po(petasq) )
}
## @@@ END OF MY R FUNCTION.
# Now I use my function above to get the y-values for my plot:
petasq <- seq(0, 1, by = .0001) ## These are X-values for my plot
f <- 30 # a function needed argument
df1 <- 3 # a function needed argument
df2 <- 108 # a function needed argument
N <- 120 # a function needed argument
alpha = 5 # a function needed argument
beta = 4 # a function needed argument
## Now use the ppp() function to get the Y-values for the X-value range above:
y.values <- ppp(f, N, df1, df2, petasq, alpha, beta)
## Finally plot petasq (as X-values) against the Y.values:
plot(petasq, y.values, ty="l", lwd = 3 )
Based on your revised question, I found the optimization that minimizes the SHORTEST distance (in x-value units) between LEFT and RIGHT boundaries:
ppp <- function(petasq, f, N, df1, df2, alpha, beta) {
pp <- function(petasq) dbeta(petasq, alpha, beta)
ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )
marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]
po <- function(x) pp(x)*ll(x) / marg
return(po(petasq) )
}
petasq <- seq(0, 1, by = .0001) ## These are X-values for my plot
f <- 30 # a function needed argument
df1 <- 3 # a function needed argument
df2 <- 108 # a function needed argument
N <- 120 # a function needed argument
alpha = 5 # a function needed argument
beta = 4 # a function needed argument
optim_func <- function(x_left) {
int_function <- function(petasq) {
ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
}
# For every LEFT value, find the corresponding RIGHT value that gives 95% area.
find_95_right <- function(x_right) {
(0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
}
x_right_obj <- optimize(f=find_95_right, interval=c(0.5,1))
if(x_right_obj$objective > .Machine$double.eps^0.25) return(100)
#Return the DISTANCE BETWEEN LEFT AND RIGHT
return(x_right_obj$minimum - x_left)
}
#MINIMIZE THE DISTANCE BETWEEN LEFT AND RIGHT
x_left <- optimize(f=optim_func, interval=c(0.30,0.40))$minimum
find_95_right <- function(x_right) {
(0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
}
int_function <- function(petasq) {
ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
}
x_right <- optimize(f=find_95_right, interval=c(0.5,1))$minimum
See the comments in the code. Hopefully this finally satisfies your question :) Results:
> x_right
[1] 0.5409488
> x_left
[1] 0.3201584
Also, you can plot the distance between LEFT and RIGHT as a function of the left boundary:
left_x_values <- seq(0.30, 0.335, 0.0001)
DISTANCE <- sapply(left_x_values, optim_func)
plot(left_x_values, DISTANCE, type="l")
这篇关于使用optimize()查找R中曲线下面95%区域的最短间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!