使用scipy.stats的分布均值和标准差 [英] Distribution mean and standard deviation using scipy.stats

查看:650
本文介绍了使用scipy.stats的分布均值和标准差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图获取对数正态分布的均值和标准差,其中mu = 0.4104857306和sigma = 3.4070874277012617,我期望均值= 500和std = 600.我不确定我做错了什么.这是代码:

I was trying to get the mean and standard deviation for log-normal distribution, where mu=0.4104857306 and sigma=3.4070874277012617, and I am expecting mean=500 and std=600. I am unsure what I have done wrong. Here are the code:

import scipy.stats as stats
import numpy as np
a = 3.4070874277012617
b = 0.4104857306
c = stats.lognorm.mean(a,b)
d = stats.lognorm.var(a,b)
e = np.sqrt(d)
print("Mean =",c)
print("std =",e)

输出在这里:

Mean = 332.07447304207903
sd = 110000.50047821256

谢谢.

感谢您的帮助.我检查了一下,发现有一些计算错误.我现在可以得到mean = 500,但仍然不能得到std = 600.这是我使用的代码:

Thank you for your help. I have checked and found out there were some calculation mistake. I can now get the mean=500 but still cannot get std=600. Here is the code that I have used:

import numpy as np
import math
from scipy import exp
from scipy.optimize import fsolve

def f(z):
    mean = 500
    std = 600
    sigma = z[0]
    mu = z[1]
    f = np.zeros(2)
    f[0] = exp(mu + (sigma**2) / 2) - mean
    f[1] = exp(2*mu + sigma**2) * exp(sigma**2 - 1) - std**2
    return f
z = fsolve (f,[1.1681794012855686,5.5322865416282365])
print("sigma =",z[0])
print("mu =",z[1])
print(f(z))

sigma = 1.1681794012855686
mu = 5.5322865416282365

我尝试用计算器检查结果,并且可以按要求获取std = 600,但仍可以通过lognorm.std(sigma, scale=np.exp(mu))获取853.5698320847896.

I have tried to check the result with my calculator, and I can get std=600 as required, I still get 853.5698320847896 with lognorm.std(sigma, scale=np.exp(mu)).

推荐答案

scipy.stats.lognorm 对数正态分布以一种稍微不寻常的方式进行参数设置,以便与其他连续分布保持一致.第一个参数是shape参数,它是您的sigma.接下来是locscale参数,它们允许移动和缩放分布.在这里您需要loc=0.0scale=exp(mu).因此,要计算均值,您需要执行以下操作:

The scipy.stats.lognorm lognormal distribution is parameterised in a slightly unusual way, in order to be consistent with the other continuous distributions. The first argument is the shape parameter, which is your sigma. That's followed by the loc and scale arguments, which allow shifting and scaling of the distribution. Here you want loc=0.0 and scale=exp(mu). So to compute the mean, you want to do something like:

>>> import numpy as np
>>> from scipy.stats import lognorm
>>> mu = 0.4104857306
>>> sigma = 3.4070874277012617
>>> lognorm.mean(sigma, 0.0, np.exp(mu))
500.0000010889041

或更明确地说:按名称传递scale参数,并将loc参数保留为其默认值0.0:

Or more clearly: pass the scale parameter by name, and leave the loc parameter at its default of 0.0:

>>> lognorm.mean(sigma, scale=np.exp(mu))
500.0000010889041

正如@coldspeed在他的评论中所说,您对标准偏差的期望值看起来不正确.我得到:

As @coldspeed says in his comment, your expected value for the standard deviation doesn't look right. I get:

>>> lognorm.std(sigma, scale=np.exp(mu))
165831.2402402415

我用手计算得出相同的值.

and I get the same value calculating by hand.

要再次检查这些参数选择是否确实在提供预期的对数正态,我创建了一个百万个偏差的样本,并查看了该样本的对数的均值和标准偏差.不出所料,这些值使我返回的值大致类似于原始的musigma:

To double check that these parameter choices are indeed giving the expected lognormal, I created a sample of a million deviates and looked at the mean and standard deviation of the log of that sample. As expected, those give me back values that look roughly like your original mu and sigma:

>>> samples = lognorm.rvs(sigma, scale=np.exp(mu), size=10**6)
>>> np.log(samples).mean()  # should be close to mu
0.4134644116056518
>>> np.log(samples).std(ddof=1)  # should be close to sigma
3.4050012251732285


作为对编辑的回应:您已经获得了对数正态的方差公式,该公式略有错误-您需要将exp(sigma**2 - 1)项替换为(exp(sigma**2) - 1).如果执行此操作,然后重新运行fsolve计算,则会得到:


In response to the edit: you've got the formula for the variance of a lognormal slightly wrong - you need to replace the exp(sigma**2 - 1) term with (exp(sigma**2) - 1). If you do that, and rerun the fsolve computation, you get:

sigma = 0.9444564779275075
mu = 5.768609079062494

使用这些值,您应该获得预期的均值和标准差:

And with those values, you should get the expected mean and standard deviation:

>>> from scipy.stats import lognorm
>>> import numpy as np
>>> sigma = 0.9444564779275075
>>> mu = 5.768609079062494
>>> lognorm.mean(sigma, scale=np.exp(mu))
499.9999999949592
>>> lognorm.std(sigma, scale=np.exp(mu))
599.9999996859631

给定所需的均值和标准差,也可以使用解析方法求解sigmamu,而不是使用fsolve.这样可以更快地为您提供更准确的结果:

Rather than using fsolve, you could also solve analytically for sigma and mu, given the desired mean and standard deviation. This gives you more accurate results, more quickly:

>>> mean = 500.0
>>> var = 600.0**2
>>> sigma = np.sqrt(np.log1p(var/mean**2))
>>> mu = np.log(mean) - 0.5*sigma*sigma
>>> mu, sigma
(5.768609078769636, 0.9444564782482624)
>>> lognorm.mean(sigma, scale=np.exp(mu))
499.99999999999966
>>> lognorm.std(sigma, scale=np.exp(mu))
599.9999999999995

这篇关于使用scipy.stats的分布均值和标准差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆