拟合离散数据:负二项式,泊松,几何分布 [英] Fitting For Discrete Data: Negative Binomial, Poisson, Geometric Distribution

查看:93
本文介绍了拟合离散数据:负二项式,泊松,几何分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在scipy中,不支持使用数据拟合离散分布.我知道有很多关于这个的话题.

In scipy there is no support for fitting discrete distributions using data. I know there are a lot of subject about this.

例如,如果我有一个如下数组:

For example if i have an array like below:

x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]

x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]

我无法申请此数组:

from scipy.stats import nbinom
param = nbinom.fit(x)

但是我想问您最新的情况,有什么方法可以适合这三个离散分布,然后为离散数据集选择最佳拟合吗?

But i would like to ask you up to date, is there any way to fit for these three discrete distributions and then choose the best fit for the discrete dataset?

推荐答案

您可以使用方法的时刻以适合任何特定的分布.

You can use Method of Moments to fit any particular distribution.

基本思想:首先获取经验矩,第二矩等,然后从这些矩导出分布参数.

Basic idea: get empirical first, second, etc. moments, then derive distribution parameters from these moments.

因此,在所有这些情况下,我们只需要两分钟.让我们得到它们:

So, in all these cases we only need two moments. Let's get them:

import pandas as pd
# for other distributions, you'll need to implement PMF
from scipy.stats import nbinom, poisson, geom

x = pd.Series(x)
mean = x.mean()
var = x.var()
likelihoods = {}  # we'll use it later

注意:我用的是pandas而不是numpy.那是因为numpy的 var() std()不适用

Note: I used pandas instead of numpy. That is because numpy's var() and std() don't apply Bessel's correction, while pandas' do. If you have 100+ samples, there shouldn't be much difference, but on smaller samples it could be important.

现在,让我们获取这些分布的参数.负二项式具有两个参数:p,r.让我们估计它们并计算数据集的可能性:

Now, let's get parameters for these distributions. Negative binomial has two parameters: p, r. Let's estimate them and calculate likelihood of the dataset:

# From the wikipedia page, we have:
# mean = pr / (1-p)
# var = pr / (1-p)**2
# without wiki, you could use MGF to get moments; too long to explain here
# Solving for p and r, we get:

p = 1 - mean / var  # TODO: check for zero variance and limit p by [0, 1]
r = (1-p) * mean / p

UPD::维基百科和scipy使用不同的p定义,一种定义将其视为成功的概率,另一种视为失败的概率.因此,要与scipy概念保持一致,请使用:

UPD: Wikipedia and scipy are using different definitions of p, one treating it as probability of success and another as probability of failure. So, to be consistent with scipy notion, use:

p = mean / var
r = p * mean / (1-p)

UPD结束

计算可能性:

likelihoods['nbinom'] = x.map(lambda val: nbinom.pmf(val, r, p)).prod()

泊松相同,只有一个参数:

Same for Poisson, there is only one parameter:

# from Wikipedia,
# mean = variance = lambda. Nothing to solve here
lambda_ = mean
likelihoods['poisson'] = x.map(lambda val: poisson.pmf(val, lambda_)).prod()

几何分布相同:

# mean = 1 / p  # this form fits the scipy definition
p = 1 / mean

likelihoods['geometric'] = x.map(lambda val: geom.pmf(val, p)).prod()

最后,让我们最合适:

best_fit = max(likelihoods, key=lambda x: likelihoods[x])
print("Best fit:", best_fit)
print("Likelihood:", likelihoods[best_fit])

如果您有任何疑问,请告诉我

Let me know if you have any questions

这篇关于拟合离散数据:负二项式,泊松,几何分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆