r中三参数威布尔分布的最大似然估计 [英] Maximum Likelihood Estimation for three-parameter Weibull distribution in r

查看:1099
本文介绍了r中三参数威布尔分布的最大似然估计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想估计3p Weibull分布的比例,形状和阈值参数.

I want to estimate the scale, shape and threshold parameters of a 3p Weibull distribution.

到目前为止,我所做的是:

What I've done so far is the following:

请参阅此帖子,在R中拟合3参数的Weibull分布

我已经使用了功能

    EPS = sqrt(.Machine$double.eps) # "epsilon" for very small numbers

llik.weibull <- function(shape, scale, thres, x)
{ 
  sum(dweibull(x - thres, shape, scale, log=T))
}

thetahat.weibull <- function(x)
{ 
  if(any(x <= 0)) stop("x values must be positive")

  toptim <- function(theta) -llik.weibull(theta[1], theta[2], theta[3], x)

  mu = mean(log(x))
  sigma2 = var(log(x))
  shape.guess = 1.2 / sqrt(sigma2)
  scale.guess = exp(mu + (0.572 / shape.guess))
  thres.guess = 1

  res = nlminb(c(shape.guess, scale.guess, thres.guess), toptim, lower=EPS)

  c(shape=res$par[1], scale=res$par[2], thres=res$par[3])
}

预估计"我的Weibull参数,以便可以将它们用作MASS-Package的"fitdistr"函数中参数"start"的初始值.

to "pre-estimate" my Weibull parameters, such that I can use them as initial values for the argument "start" in the "fitdistr" function of the MASS-Package.

您可能会问为什么我要对参数进行两次估算...原因是我需要估算的方差-协方差矩阵,该矩阵也由fitdistr函数估算.

You might ask why I want to estimate the parameters twice... reason is that I need the variance-covariance-matrix of the estimates which is also estimated by the fitdistr function.

示例:

    set.seed(1)

    thres <- 450
    dat <- rweibull(1000, 2.78, 750) + thres

pre_mle <- thetahat.weibull(dat)

    my_wb <- function(x, shape, scale, thres) { 
        dweibull(x - thres, shape, scale)
    }

    ml <- fitdistr(dat, densfun = my_wb, start = list(shape = round(pre_mle[1], digits = 0), scale = round(pre_mle[2], digits = 0), 
    thres = round(pre_mle[3], digits = 0)))

    ml

     > ml
       shape        scale        thres   
       2.942548   779.997177   419.996196   (  0.152129) ( 32.194294) ( 28.729323)

     > ml$vcov
                shape       scale       thres
    shape  0.02314322    4.335239   -3.836873
    scale  4.33523868 1036.472551 -889.497580
    thres -3.83687258 -889.497580  825.374029 

这对于形状参数大于1的情况非常有效.不幸的是,我的方法应该处理形状参数可能小于1的情况.

This works quite well for cases where the shape parameter is above 1. Unfortunately my approach should deal with the cases where the shape parameter could be smaller than 1.

此处描述了为什么对于小于1的形状参数无法实现的原因:

The reason why this is not possible for shape parameters that are smaller than 1 is described here: http://www.weibull.com/hotwire/issue148/hottopics148.htm

在情况1中,所有三个参数都是未知的,说明如下:

in Case 1, All three parameters are unknown the following is said:

将ti的最小失效时间定义为tmin.然后,当γ→tmin时,ln(tmin-γ)→-∞.如果β小于1,则(β-1)ln(tmin-γ)转到+∞,对于给定的β,η和γ解,我们总是可以找到另一组解(例如,通过使γ更接近tmin),这将提供较大的似然值,因此,没有MLE解对于β,η和γ."

"Define the smallest failure time of ti to be tmin. Then when γ → tmin, ln(tmin - γ) → -∞. If β is less than 1, then (β - 1)ln(tmin - γ) goes to +∞ . For a given solution of β, η and γ, we can always find another set of solutions (for example, by making γ closer to tmin) that will give a larger likelihood value. Therefore, there is no MLE solution for β, η and γ."

这很有意义.由于这个原因,我想按照他们在此页面上描述的方式进行操作.

This makes a lot of sense. For this very reason I want to do it the way they described it on this page.

在Weibull ++中,使用基于梯度的算法来查找β,η和γ的MLE解.γ范围的上限任意设置为0.99 of tmin.取决于数据集,返回局部最优值或0.99tmin作为γ的MLE解."

"In Weibull++, a gradient-based algorithm is used to find the MLE solution for β, η and γ. The upper bound of the range for γ is arbitrarily set to be 0.99 of tmin. Depending on the data set, either a local optimal or 0.99tmin is returned as the MLE solution for γ."

我想为伽玛设置一个可行的间隔(在我的代码中称为阈值"),以使解决方案介于(0,.99 * tmin)之间.

I want to set a feasible interval for gamma (in my code called 'thres') such that the solution is between (0, .99 * tmin).

有人知道如何解决此问题吗?

Does anyone have an idea how to solve this problem?

在函数fitdistr中,似乎没有机会进行约束一个MLE,约束一个参数.

In the function fitdistr there seems to be no opportunity doing a constrained MLE, constraining one parameter.

另一种可行的方法是通过得分向量的外积估计渐近方差.得分向量可以取自上面使用的函数thetahat.weibul(x).但是,手动计算外部乘积(不带功能)似乎非常耗时,并且无法解决ML估计受限的问题.

Another way to go could be the estimation of the asymptotic variance via the outer product of the score vectors. The score vector could be taken from the above used function thetahat.weibul(x). But calculating the outer product manually (without function) seems to be very time consuming and does not solve the problem of the constrained ML estimation.

最诚挚的问候, 蒂姆

推荐答案

设置受约束的MLE并不难.我将在bbmle::mle2中执行此操作;您也可以在stats4::mle中进行此操作,但是bbmle具有一些附加功能.

It's not too hard to set up a constrained MLE. I'm going to do this in bbmle::mle2; you could also do it in stats4::mle, but bbmle has some additional features.

更大的问题是,在理论上很难定义估计值的抽样方差,当它位于允许空间的边界上时; Wald方差估计背后的理论被打破了.您仍然可以通过似然分析来计算置信区间...,也可以进行引导.在执行此操作时,我遇到了各种优化问题……我还没有真正考虑过是否存在特定原因

The larger issue is that it's theoretically difficult to define the sampling variance of an estimate when it's on the boundary of the allowed space; the theory behind Wald variance estimates breaks down. You can still calculate confidence intervals by likelihood profiling ... or you could bootstrap. I ran into a variety of optimization issues when doing this ... I haven't really thought about wether there are specific reasons

重新格式化三参数Weibull函数以供mle2使用(以x作为第一个参数,以log作为参数):

Reformat three-parameter Weibull function for mle2 use (takes x as first argument, takes log as an argument):

dweib3 <- function(x, shape, scale, thres, log=TRUE) { 
    dweibull(x - thres, shape, scale, log=log)
}

启动功能(略有重新格式化):

Starting function (slightly reformatted):

weib3_start <- function(x) {
   mu <- mean(log(x))
   sigma2 <- var(log(x))
   logshape  <- log(1.2 / sqrt(sigma2))
   logscale <- mu + (0.572 / logshape)
   logthres <- log(0.5*min(x))
   list(logshape = logshape, logsc = logscale, logthres = logthres)
}

生成数据:

set.seed(1)
dat <- data.frame(x=rweibull(1000, 2.78, 750) + 450)

拟合模型:为了方便和稳定,我在对数刻度上拟合参数,但是您也可以使用零边界.

Fit model: I'm fitting the parameters on the log scale for convenience and stability, but you could use boundaries at zero as well.

tmin <- log(0.99*min(dat$x))
library(bbmle)
m1 <- mle2(x~dweib3(exp(logshape),exp(logsc),exp(logthres)),
           data=dat,
           upper=c(logshape=Inf,logsc=Inf,
                   logthres=tmin),
           start=weib3_start(dat$x),
           method="L-BFGS-B")

vcov(m1),通常应该提供方差-协方差估计值(除非估计值在边界上,在这里不是这种情况)会给出NaN值...不确定为什么不进行更多挖掘.

vcov(m1), which should normally provide a variance-covariance estimate (unless the estimate is on the boundary, which is not the case here) gives NaN values ... not sure why without more digging.

library(emdbook)
tmpf <- function(x,y) m1@minuslogl(logshape=x,
                                         logsc=coef(m1)["logsc"],
                                         logthres=y)
tmpf(1.1,6)
s1 <- curve3d(tmpf,
              xlim=c(1,1.2),ylim=c(5.9,tmin),sys3d="image")
with(s1,contour(x,y,z,add=TRUE))

h <- lme4:::hessian(function(x) do.call(m1@minuslogl,as.list(x)),coef(m1))
vv <- solve(h)
diag(vv) ## [1] 0.002672240 0.001703674 0.004674833
(se <- sqrt(diag(vv))) ## standard errors
## [1] 0.05169371 0.04127558 0.06837275
cov2cor(vv)
##            [,1]       [,2]       [,3]
## [1,]  1.0000000  0.8852090 -0.8778424
## [2,]  0.8852090  1.0000000 -0.9616941
## [3,] -0.8778424 -0.9616941  1.0000000

这是 log缩放的变量的方差-协方差矩阵.如果要转换为原始比例的方差-协方差矩阵,则需要按(x_i)*(x_j)进行缩放(即按变换exp(x)的导数进行缩放).

This is the variance-covariance matrix of the log-scaled variables. If you want to convert to the variance-covariance matrix on the original scale, you need to scale by (x_i)*(x_j) (i.e. by the derivatives of the transformation exp(x)).

outer(exp(coef(m1)),exp(coef(m1))) * vv
##             logshape       logsc    logthres
## logshape  0.02312803    4.332993   -3.834145
## logsc     4.33299307 1035.966372 -888.980794
## logthres -3.83414498 -888.980794  824.831463

我不知道为什么这对numDeriv不起作用-在上面的方差估计中,非常小心. (也许离边界太近,以至于理查森推断无法奏效?)

I don't know why this doesn't work with numDeriv - would be very careful with variance estimates above. (Maybe too close to boundary for Richardson extrapolation to work?)

library(numDeriv)
hessian()
grad(function(x) do.call(m1@minuslogl,as.list(x)),coef(m1))  ## looks OK
vcov(m1)

配置文件看起来不错...(我们必须提供std.err,因为黑森州不可逆)

The profiles look OK ... (we have to supply std.err because the Hessian isn't invertible)

pp <- profile(m1,std.err=c(0.01,0.01,0.01))
par(las=1,bty="l",mfcol=c(1,3))
plot(pp,show.points=TRUE)

confint(pp)
##              2.5 %   97.5 %
## logshape 0.9899645 1.193571
## logsc    6.5933070 6.755399
## logthres 5.8508827 6.134346

或者,我们可以在原始比例尺上执行此操作...一种可能性是使用对数比例尺进行拟合,然后从原始比例尺上的那些参数开始重新拟合.

Alternately, we can do this on the original scale ... one possibility would be to use the log-scaling to fit, then refit starting from those parameters on the original scale.

wstart <- as.list(exp(unlist(weib3_start(dat$x))))
names(wstart) <- gsub("log","",names(wstart))
m2 <- mle2(x~dweib3(shape,sc,thres),
           data=dat,
           lower=c(shape=0.001,sc=0.001,thres=0.001),
           upper=c(shape=Inf,sc=Inf,
                   thres=exp(tmin)),
           start=wstart,
           method="L-BFGS-B")
vcov(m2)
##             shape          sc       thres
## shape  0.02312399    4.332057   -3.833264
## sc     4.33205658 1035.743511 -888.770787
## thres -3.83326390 -888.770787  824.633714
all.equal(unname(coef(m2)),unname(exp(coef(m1))),tol=1e-4)

与上面的值大致相同.

如果我们更谨慎地绑定参数,我们可以采用较小的形状,但是现在我们最终到达阈值的边界,这将导致方差计算出现很多问题.

We can fit with a small shape, if we are a little more careful to bound the paraameters, but now we end up on the boundary for the threshold, which will cause lots of problems for the variance calculations.

set.seed(1)
dat <- data.frame(x = rweibull(1000, .53, 365) + 100)
tmin <- log(0.99 * min(dat$x))
m1 <- mle2(x ~ dweib3(exp(logshape), exp(logsc), exp(logthres)),
   lower=c(logshape=-10,logscale=0,logthres=0),
   upper = c(logshape = 20, logsc = 20, logthres = tmin),
   data = dat, 
   start = weib3_start(dat$x), method = "L-BFGS-B") 

对于被检查的数据,您需要将dweibull替换为pweibull;参见在三个参数上运行最大似然估计的错误威布尔cdf 的一些提示.

For censored data, you need to replace dweibull with pweibull; see Errors running Maximum Likelihood Estimation on a three parameter Weibull cdf for some hints.

这篇关于r中三参数威布尔分布的最大似然估计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆