在 scipy.stats 中,rv_continuous 有一个适合方法来查找 MLE,但 rv_discrete 没有.为什么? [英] In scipy.stats rv_continuous has a fit method to find MLEs, but rv_discrete does not. Why?

查看:49
本文介绍了在 scipy.stats 中,rv_continuous 有一个适合方法来查找 MLE,但 rv_discrete 没有.为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为某些可能受离散分布控制的数据找到最大似然估计量.但是在 scipy.stats 中,只有表示连续分布的类才有拟合函数来做到这一点.代表离散分布的类没有的原因是什么?

解决方案

简短回答:因为据我所知,没有人为它编写代码,甚至没有人尝试过.

更长的答案:我不知道我们可以使用通用最大似然方法对离散模型进行多远,因为连续分布适用于许多但不是所有这些.

大多数离散分布对参数都有很强的限制,而且很可能大多数都需要特定于分布的拟合方法

<预><代码>>>>[(f, getattr(stats, f).shapes) for f in dir(stats) if isinstance(getattr(stats, f), stats.distributions.rv_discrete)][('bernoulli', 'pr'), ('binom', 'n, pr'), ('boltzmann', 'lamda, N'),('dlaplace', 'a'), ('geom', 'pr'), ('hypergeom', 'M, n, N'),('logser', 'pr'), ('nbinom', 'n, pr'), ('planck', 'lamda'),('poisson', 'mu'), ('randint', 'min, max'), ('skellam', 'mu1,mu2'),('zipf', 'a')]

statsmodels 提供了一些离散模型,其中参数也可以依赖于一些解释变量.其中大多数,如广义线性模型,需要一个链接函数来将参数的值限制在有效范围内,例如区间 (0, 1) 表示概率,或大于零表示计数模型中的参数.

那么二项式中的n"参数和其他一些参数必须是整数,这使得无法使用 scipy.optimize 中的常用连续最小化器.

一个好的解决方案是让某人添加特定于分布的拟合方法,这样我们至少可以使用更简单的方法.

I would like to find the Maximum Likelihood Estimator for some data that may be governed by a discrete distribution. But in scipy.stats only classes representing continuous distributions have a fit function to do that. What is the reason that the classes representing discrete distributions do not?

解决方案

Short answer: because nobody wrote the code for it, or even tried, as far as I know.

Longer answer: I don't know how far we can get with the discrete models with a generic maximum likelihood method as ther is for the continuous distributions, which works for many but not all of those.

Most discrete distributions have strong restrictions on the parameters, and most likely most of them will need a fit methods specific to the distribution

>>> [(f, getattr(stats, f).shapes) for f in dir(stats) if isinstance(getattr(stats, f), stats.distributions.rv_discrete)]
[('bernoulli', 'pr'), ('binom', 'n, pr'), ('boltzmann', 'lamda, N'), 
 ('dlaplace', 'a'), ('geom', 'pr'), ('hypergeom', 'M, n, N'), 
 ('logser', 'pr'), ('nbinom', 'n, pr'), ('planck', 'lamda'), 
 ('poisson', 'mu'), ('randint', 'min, max'), ('skellam', 'mu1,mu2'), 
 ('zipf', 'a')]

statsmodels is providing a few of the discrete models where the parameters can also depend on some explanatory variables. Most of those, like generalized linear models, need a link function to restrict the values for the parameters to the valid range, for example interval (0, 1) for probabilities, or larger than zero for parameters in count models.

Then "n" parameter in binomial and some of the other ones are required to be integers, which makes it impossible to use the usual continuous minimizers from scipy.optimize.

A good solution would be for someone to add distribution specific fit methods, so that we have at least the easier ones available.

这篇关于在 scipy.stats 中,rv_continuous 有一个适合方法来查找 MLE,但 rv_discrete 没有.为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆