使用 Scipy 与 Matlab 拟合对数正态分布 [英] Fitting lognormal distribution using Scipy vs Matlab

查看:45
本文介绍了使用 Scipy 与 Matlab 拟合对数正态分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Scipy 拟合对数正态分布.我之前已经使用 Matlab 完成了它,但由于需要将应用程序扩展到统计分析之外,我正在尝试在 Scipy 中重现拟合值.

I am trying to fit a lognormal distribution using Scipy. I've already done it using Matlab before but because of the need to extend the application beyond statistical analysis, I am in the process of trying to reproduce the fitted values in Scipy.

以下是我用来拟合数据的 Matlab 代码:

Below is the Matlab code I used to fit my data:

% Read input data (one value per line)
x = [];
fid = fopen(file_path, 'r'); % reading is default action for fopen
disp('Reading network degree data...');
if fid == -1
    disp('[ERROR] Unable to open data file.')
else
    while ~feof(fid)
        [x] = [x fscanf(fid, '%f', [1])];

    end
    c = fclose(fid);
    if c == 0
         disp('File closed successfully.');
    else
        disp('[ERROR] There was a problem with closing the file.');
    end
end

[f,xx] = ecdf(x);
y = 1-f;

parmhat  = lognfit(x); % MLE estimate
mu = parmhat(1);
sigma = parmhat(2);

这是拟合图:

现在这是我的 Python 代码,旨在实现相同的目标:

Now here's my Python code with the aim of achieving the same:

import math
from scipy import stats
from statsmodels.distributions.empirical_distribution import ECDF 

# The same input is read as a list in Python
ecdf_func = ECDF(degrees)
x = ecdf_func.x
ccdf = 1-ecdf_func.y

# Fit data
shape, loc, scale = stats.lognorm.fit(degrees, floc=0)

# Parameters
sigma = shape # standard deviation
mu = math.log(scale) # meanlog of the distribution

fit_ccdf = stats.lognorm.sf(x, [sigma], floc=1, scale=scale) 

这是使用 Python 代码的合适之处.

Here's the fit using the Python code.

如您所见,两组代码都能够产生良好的拟合,至少在视觉上是这样.

As you can see, both sets of code are capable of producing good fits, at least visually speaking.

问题在于估计参数 mu 和 sigma 存在巨大差异.

Problem is that there is a huge difference in the estimated parameters mu and sigma.

来自 Matlab:mu = 1.62 sigma = 1.29.来自 Python:mu = 2.78 sigma = 1.74.

From Matlab: mu = 1.62 sigma = 1.29. From Python: mu = 2.78 sigma = 1.74.

为什么会有这么大的差别?

Why is there such a difference?

注意:我已经仔细检查了两组拟合的数据完全相同.相同的点数,相同的分布.

Note: I have double checked that both sets of data fitted are exactly the same. Same number of points, same distribution.

非常感谢您的帮助!提前致谢.

Your help is much appreciated! Thanks in advance.

其他信息:

import scipy
import numpy
import statsmodels

scipy.__version__
'0.9.0'

numpy.__version__
'1.6.1'

statsmodels.__version__
'0.5.0.dev-1bbd4ca'

Matlab 版本为 R2011b.

Version of Matlab is R2011b.

版本:

如下面的回答所示,问题在于 Scipy 0.9.我可以使用 Scipy 11.0 从 Matlab 中重现 mu 和 sigma 结果.

As demonstrated in the answer below, the fault lies with Scipy 0.9. I am able to reproduce the mu and sigma results from Matlab using Scipy 11.0.

更新 Scipy 的一种简单方法是:

An easy way to update your Scipy is:

pip install --upgrade Scipy

如果你没有 pip(你应该!):

If you don't have pip (you should!):

sudo apt-get install pip

推荐答案

scipy 0.9.0 中的 fit 方法存在一个错误,该错误已在 scipy 的更高版本中修复.

There is a bug in the fit method in scipy 0.9.0 that has been fixed in later versions of scipy.

下面脚本的输出应该是:

The output of the script below should be:

Explicit formula:   mu = 4.99203450, sig = 0.81691086
Fit log(x) to norm: mu = 4.99203450, sig = 0.81691086
Fit x to lognorm:   mu = 4.99203468, sig = 0.81691081

但是使用 scipy 0.9.0,它是

but with scipy 0.9.0, it is

Explicit formula:   mu = 4.99203450, sig = 0.81691086
Fit log(x) to norm: mu = 4.99203450, sig = 0.81691086
Fit x to lognorm:   mu = 4.23197270, sig = 1.11581240

以下测试脚本展示了三种获得相同结果的方法:

The following test script shows three ways to get the same results:

import numpy as np
from scipy import stats


def lognfit(x, ddof=0):
    x = np.asarray(x)
    logx = np.log(x)
    mu = logx.mean()
    sig = logx.std(ddof=ddof)
    return mu, sig


# A simple data set for easy reproducibility
x = np.array([50., 50, 100, 200, 200, 300, 500])

# Explicit formula
my_mu, my_sig = lognfit(x)

# Fit a normal distribution to log(x)
norm_mu, norm_sig = stats.norm.fit(np.log(x))

# Fit the lognormal distribution
lognorm_sig, _, lognorm_expmu = stats.lognorm.fit(x, floc=0)

print "Explicit formula:   mu = %10.8f, sig = %10.8f" % (my_mu, my_sig)
print "Fit log(x) to norm: mu = %10.8f, sig = %10.8f" % (norm_mu, norm_sig)
print "Fit x to lognorm:   mu = %10.8f, sig = %10.8f" % (np.log(lognorm_expmu), lognorm_sig)

在 std 中使用选项 ddof=1.开发使用无偏方差估计的计算:

With the option ddof=1 in the std. dev. calculation to use the unbiased variance estimation:

In [104]: x
Out[104]: array([  50.,   50.,  100.,  200.,  200.,  300.,  500.])

In [105]: lognfit(x, ddof=1)
Out[105]: (4.9920345004312647, 0.88236457185021866)

matlab 的 lognfit 文档 中有一条说明,说明何时不进行审查使用时,lognfit 使用方差的无偏估计量的平方根计算 sigma.这相当于在上面的代码中使用了 ddof=1.

There is a note in matlab's lognfit documentation that says when censoring is not used, lognfit computes sigma using the square root of the unbiased estimator of the variance. This corresponds to using ddof=1 in the above code.

这篇关于使用 Scipy 与 Matlab 拟合对数正态分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆