如何在python中将最佳概率分布模型拟合到我的数据? [英] How to fit the best probability distribution model to my data in python?
问题描述
我大约有20,000行数据,
i have about 20,000 rows of data like this,,
Id | value
1 30
2 3
3 22
..
n 27
我对自己的数据进行了统计,平均值为33.85,中位数为30.99,最小值为2.8,最大值为206,95%置信区间为0.21..所以大多数值在33左右,并且有一些离群值(一点)似乎是一个长尾巴的分布.
I did statistics to my data,, the average value 33.85, median 30.99, min 2.8, max 206, 95% confidence interval 0.21.. So most values around 33, and there are some outliers (a little).. So it seems like a distribution with long tail.
我对发行版和python都是新手,我尝试了类钳工 https://pypi.org/project/fitter/尝试从Scipy包进行许多分发,并且loglaplace分发显示出最低的错误(尽管不是很安静).
I am new to both distribution and python,, i tried class fitter https://pypi.org/project/fitter/ to try many distribution from Scipy package,, and loglaplace distribution showed the lowest error (although not quiet understand it).
我阅读了该线程中的几乎所有问题,并总结了两种方法(1)拟合分布模型,然后在仿真中绘制随机值(2)计算不同值组的频率,但是该解决方案不会例如,其值大于206.
I read almost all questions in this thread and i concluded two approaches (1) fitting a distribution model and then in my simulation i draw random values (2) compute the frequency of different groups of values,, but this solution will not have a value more than 206 for example.
让我的数据是值(数字),什么是在Python中拟合数据分布的最佳方法,就像在模拟中我需要绘制数字一样.随机数必须与我的数据具有相同的模式.另外,我还需要通过绘制数据和模型曲线来验证模型是否能很好地呈现数据.
Having my data which is values (number), what is the best approach to fit a distribution to my data in python as in my simulation i need to draw numbers. The random numbers must have same pattern as my data. Also i need to validate the model is well presenting my data by drawing my data and the model curve.
推荐答案
一种方法是根据贝叶斯信息准则(称为BIC)选择最佳模型.OpenTURNS实现了自动选择方法(在此处查看文档).
One way is to select the best model according to the Bayesian information criterion (called BIC). OpenTURNS implements an automatic method of selection (see doc here).
假设您有一个数组 x = [0、1、2、3、4、5、6、7、8、9、10]
,下面是一个简单的示例:
Suppose you have an array x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
, here a quick example:
import openturns as ot
# Define x as a Sample object. It is a sample of size 11 and dimension 1
sample = ot.Sample([[xi] for xi in x])
# define distributions you want to test on the sample
tested_distributions = [ot.WeibullMaxFactory(), ot.NormalFactory(), ot.UniformFactory()]
# find the best distribution according to BIC and print its parameters
best_model, best_bic = ot.FittingTest.BestModelBIC(sample, tested_distributions)
print(best_model)
>>> Uniform(a = -0.769231, b = 10.7692)
这篇关于如何在python中将最佳概率分布模型拟合到我的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!