python-如何在python中正确安装beta发行版? [英] How to properly fit a beta distribution in python?

查看:166
本文介绍了python-如何在python中正确安装beta发行版?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到适合Beta版本的正确方法.这不是一个现实世界的问题,我只是测试几种不同方法的效果,而这样做却使我感到困惑.

I am trying to get a correct way of fitting a beta distribution. It's not a real world problem i am just testing the effects of a few different methods, and in doing this something is puzzling me.

这是我正在使用的python代码,在其中测试了3种不同的方法: 1>:使用矩拟合(样本均值和方差). 2>:通过最小化负对数可能性来拟合(通过使用scipy.optimize.fmin()). 3>:只需调用scipy.stats.beta.fit()

Here is the python code I am working on, in which I tested 3 different approaches: 1>: fit using moments (sample mean and variance). 2>: fit by minimizing the negative log-likelihood (by using scipy.optimize.fmin()). 3>: simply call scipy.stats.beta.fit()

from scipy.optimize import fmin
from scipy.stats import beta
from scipy.special import gamma as gammaf
import matplotlib.pyplot as plt
import numpy


def betaNLL(param,*args):
    '''Negative log likelihood function for beta
    <param>: list for parameters to be fitted.
    <args>: 1-element array containing the sample data.

    Return <nll>: negative log-likelihood to be minimized.
    '''

    a,b=param
    data=args[0]
    pdf=beta.pdf(data,a,b,loc=0,scale=1)
    lg=numpy.log(pdf)
    #-----Replace -inf with 0s------
    lg=numpy.where(lg==-numpy.inf,0,lg)
    nll=-1*numpy.sum(lg)
    return nll

#-------------------Sample data-------------------
data=beta.rvs(5,2,loc=0,scale=1,size=500)

#----------------Normalize to [0,1]----------------
#data=(data-numpy.min(data))/(numpy.max(data)-numpy.min(data))

#----------------Fit using moments----------------
mean=numpy.mean(data)
var=numpy.var(data,ddof=1)
alpha1=mean**2*(1-mean)/var-mean
beta1=alpha1*(1-mean)/mean

#------------------Fit using mle------------------
result=fmin(betaNLL,[1,1],args=(data,))
alpha2,beta2=result

#----------------Fit using beta.fit----------------
alpha3,beta3,xx,yy=beta.fit(data)

print '\n# alpha,beta from moments:',alpha1,beta1
print '# alpha,beta from mle:',alpha2,beta2
print '# alpha,beta from beta.fit:',alpha3,beta3

#-----------------------Plot-----------------------
plt.hist(data,bins=30,normed=True)
fitted=lambda x,a,b:gammaf(a+b)/gammaf(a)/gammaf(b)*x**(a-1)*(1-x)**(b-1) #pdf of beta

xx=numpy.linspace(0,max(data),len(data))
plt.plot(xx,fitted(xx,alpha1,beta1),'g')
plt.plot(xx,fitted(xx,alpha2,beta2),'b')
plt.plot(xx,fitted(xx,alpha3,beta3),'r')

plt.show()

我遇到的问题是关于归一化过程(z=(x-a)/(b-a)),其中ab分别是样本的最小值和最大值.

The problem I have is about the normalization process (z=(x-a)/(b-a)) where a and b are the min and max of the sample, respectively.

当我不进行归一化时,一切正常,好的拟合方法之间会有一些细微的差异.

When I don't do the normalization, everything works Ok, there are slight differences among different fitting methods, by reasonably good.

但是当我进行归一化时,这是我得到的结果图.

But when I did the normalization, here is the result plot I got.

只有矩法(绿线)看起来还可以.

Only the moment method (green line) looks Ok.

无论我使用什么参数生成随机数,scipy.stats.beta.fit()方法(红线)始终是统一的.

The scipy.stats.beta.fit() method (red line) is uniform always, no matter what parameters I use to generate the random numbers.

MLE(蓝线)失败.

And the MLE (blue line) fails.

因此,看来规范化正在制造这些问题.但是我认为在beta版本中包含x=0x=1是合法的.如果给定一个现实世界的问题,将样本观测值归一化以使其介于[0,1]之间不是第一步吗?在那种情况下,我应该如何拟合曲线?

So it seems like the normalization is creating these issues. But I think it is legal to have x=0 and x=1 in the beta distribution. And if given a real world problem, isn't it the 1st step to normalize the sample observations to make it in between [0,1] ? In that case, how should I fit the curve?

推荐答案

在没有beta.fit的文档字符串的情况下,查找起来有些棘手,但是如果您知道要对beta.fit施加的上限和下限, ,则可以使用kwargs flocfscale.

Without a docstring for beta.fit, it was a little tricky to find, but if you know the upper and lower limits you want to force upon beta.fit, you can use the kwargs floc and fscale.

我仅使用beta.fit方法运行您的代码,但使用和不使用floc和fscale kwargs.另外,我使用int和float的参数对其进行了检查,以确保这不会影响您的答案.没有(在此测试中.我不能说是否永远不会.)

I ran your code only using the beta.fit method, but with and without the floc and fscale kwargs. Also, I checked it with the arguments as ints and floats to make sure that wouldn't affect your answer. It didn't (on this test. I can't say if it never would.)

>>> from scipy.stats import beta
>>> import numpy
>>> def betaNLL(param,*args):
    '''Negative log likelihood function for beta
    <param>: list for parameters to be fitted.
    <args>: 1-element array containing the sample data.

    Return <nll>: negative log-likelihood to be minimized.
    '''

    a,b=param
    data=args[0]
    pdf=beta.pdf(data,a,b,loc=0,scale=1)
    lg=numpy.log(pdf)
    #-----Replace -inf with 0s------
    lg=numpy.where(lg==-numpy.inf,0,lg)
    nll=-1*numpy.sum(lg)
    return nll

>>> data=beta.rvs(5,2,loc=0,scale=1,size=500)
>>> beta.fit(data)
(5.696963536654355, 2.0005252702837009, -0.060443307228404922, 1.0580278414086459)
>>> beta.fit(data,floc=0,fscale=1)
(5.0952451826831462, 1.9546341057106007, 0, 1)
>>> beta.fit(data,floc=0.,fscale=1.)
(5.0952451826831462, 1.9546341057106007, 0.0, 1.0)

总而言之,这似乎并不会改变您的数据(通过规范化)或丢弃数据.我只是认为应该指出,使用此工具时应格外小心.在您的情况下,您知道限制为0和1,因为您是从0到1之间的已定义分布中获取数据的.在其他情况下,可能知道限制,但是如果不知道限制,则beta.fit将提供它们.在这种情况下,未指定限制0和1,beta.fit计算得出它们分别为loc=-0.06scale=1.058.

In conclusion, it seems this doesn't change your data (through normalization) or throw out data. I just think it should be noted that care should be taken when using this. In your case, you knew the limits were 0 and 1 because you got data out of a defined distribution that was between 0 and 1. In other cases, limits might be known, but if they are not known, beta.fit will provide them. In this case, without specifying the limits of 0 and 1, beta.fit calculated them to be loc=-0.06 and scale=1.058.

这篇关于python-如何在python中正确安装beta发行版?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆