如何在python中没有逆分布函数的情况下手动生成Q-Q图 [英] How to generate a Q-Q plot manually without inverse distribution function in python

查看:164
本文介绍了如何在python中没有逆分布函数的情况下手动生成Q-Q图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经将4种不同的分布拟合到一个观察样本中.现在,我想比较我的结果并找到最佳解决方案.我知道有很多不同的方法可以做到这一点,但是我想使用分位数-分位数(q-q)图.

我的4个分布的公式为:

其中K 0 是第二种零级的修正贝塞尔函数,而Γ是伽马函数.

我的样本样式大致如下:(0.2、0.2、0.2、0.3、0.3、0.4、0.4、0.4、0.4、0.6、0.7 ...),所以我有多个相同的值,并且它们之间也有间隔

我已经阅读了

3)这是我无法控制的重点.

据我所知,现在我应该使用预先计算出的值(那些均匀分布的值),将它们放在上述分布的反函数中,从而计算出分布的理论分位数.

作为参考,以下是反函数(部分使用 wolframalpha ,并尽可能):

其中W是Lambert W函数,其后的所有内容都是自变量.

问题是,显然第一分布不存在逆函数.下一个可能会产生复杂的值(在根下为负值,因为根据拟合,b = 0.55),最后两个具有Lambert W函数(在此我不确定如何在python中实现它们). /p>

所以我的问题是,有没有一种方法可以计算出q-q图而没有逆分布函数的解析表达式?

非常感谢您能为我提供的任何帮助!

解决方案

碰巧有一种更简单的方法.花了我一两天的时间,直到我在scipy.stats中指出了正确的方法.我在找错名字!

首先,构建rv_continuous的子类来表示您的发行版之一.我们知道您的发行版pdf,所以这就是我们所定义的.在这种情况下,只有一个参数.如果需要更多,只需将它们添加到def语句中,然后根据需要在return语句中使用它们.

>>> from scipy import stats
>>> param = 3/2
>>> from math import exp
>>> class NoName(stats.rv_continuous):
...     def _pdf(self, x, param):
...         return param*exp(-param*x)
...     

现在创建此对象的一个​​实例,声明其支持的下限(即r.v.可以假定的最小值),以及调用的参数.

>>> noname = NoName(a=0, shapes='param')

我没有实际的值样本可以使用.我将创建一个伪随机样本.

>>> sample = noname.rvs(size=100, param=param)

将其排序成所谓的经验CDF".

>>> empirical_cdf = sorted(sample)

该样本有100个元素,因此会生成100个点,以对cdf逆函数或分位数函数进行采样,如您所引用的论文中所述.

>>> theoretical_points = [(_-0.5)/len(sample) for _ in range(1, 1+len(sample))]

获取这些点的分位数函数值.

>>> theoretical_cdf = [noname.ppf(_, param=param) for _ in theoretical_points]

全部绘制.

>>> from matplotlib import pyplot as plt
>>> plt.plot([0,3.5], [0, 3.5], 'b-')
[<matplotlib.lines.Line2D object at 0x000000000921B400>]
>>> plt.scatter(empirical_cdf, theoretical_cdf)
<matplotlib.collections.PathCollection object at 0x000000000921BD30>
>>> plt.show()

这是得出的Q-Q图.

I have 4 different distributions which I've fitted to a sample of observations. Now I want to compare my results and find the best solution. I know there are a lot of different methods to do that, but I'd like to use a quantile-quantile (q-q) plot.

The formulas for my 4 distributions are:

where K0 is the modified Bessel function of the second kind and zeroth order, and Γ is the gamma function.

My sample style looks roughly like this: (0.2, 0.2, 0.2, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.6, 0.7 ...), so I have multiple identical values and also gaps in between them.

I've read the instructions on this site and tried to implement them in python. So, like in the link:

1) I sorted my data from the smallest to the largest value.

2) I computed "n" evenly spaced points on the interval (0,1), where "n" is my sample size.

3) And this is the point I can't manage.

As far as I understand, I should now use the values I calculated beforehand (those evenly spaced values), put them in the inverse functions of my above distributions and thus compute the theoretical quantiles of my distributions.

For reference, here are the inverse functions (partly calculated with wolframalpha, and as far it was possible):

where W is the Lambert W-function and everything in brackets afterwards is the argument.

The problem is, apparently there doesn't exist an inverse function for the first distribution. The next one would probably produce complex values (negative under the root, because b = 0.55 according to the fit) and the last two of them have a Lambert W-Function (where I'm unsecure how to implement them in python).

So my question is, is there a way to calculate the q-q plots without the analytical expressions of the inverse distribution functions?

I'd appreciate any help you could give me very much!

解决方案

It happens that there is an easier way. It's taken me a day or two to dig around until I was pointed toward the right method in scipy.stats. I was looking for the wrong sort of name!

First, build a subclass of rv_continuous to represent one of your distributions. We know the pdf for your distributions, so that's what we define. In this case there's just one parameter. If more are needed just add them to the def statement and use them in the return statement as required.

>>> from scipy import stats
>>> param = 3/2
>>> from math import exp
>>> class NoName(stats.rv_continuous):
...     def _pdf(self, x, param):
...         return param*exp(-param*x)
...     

Now create an instance of this object, declare the lower end of its support (ie, the lowest value that the r.v. can assume), and what the parameters are called.

>>> noname = NoName(a=0, shapes='param')

I don't have an actual sample of values to play with. I'll create a pseudo-random sample.

>>> sample = noname.rvs(size=100, param=param)

Sort it to make it into the so-called 'empirical cdf'.

>>> empirical_cdf = sorted(sample)

The sample has 100 elements, therefore generate 100 points at which to sample the inverse cdf, or quantile function, as discussed in the paper your referenced.

>>> theoretical_points = [(_-0.5)/len(sample) for _ in range(1, 1+len(sample))]

Get the quantile function values at these points.

>>> theoretical_cdf = [noname.ppf(_, param=param) for _ in theoretical_points]

Plot it all.

>>> from matplotlib import pyplot as plt
>>> plt.plot([0,3.5], [0, 3.5], 'b-')
[<matplotlib.lines.Line2D object at 0x000000000921B400>]
>>> plt.scatter(empirical_cdf, theoretical_cdf)
<matplotlib.collections.PathCollection object at 0x000000000921BD30>
>>> plt.show()

Here's the Q-Q plot that results.

这篇关于如何在python中没有逆分布函数的情况下手动生成Q-Q图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆