在Python中将曲线拟合为直方图 [英] Fit a curve to a histogram in Python

查看:107
本文介绍了在Python中将曲线拟合为直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使曲线拟合matplotlib生成的直方图中的值:

I am trying to make to fit a curve to the values in a matplotlib generated histogram:

n, bins, patches = plt.hist(myData)

其中"plt"代表matplotlib.pyplot,而myData是一个数组,其中每个索引的出现次数均为[9,3,3,....]

Where "plt" stands for matplotlib.pyplot and myData is an array with number of occurrences every index like [9,3,3,....]

我希望垃圾箱是我的x数据,而n是我的y数据.也就是说,我想提取有关数字x occors与数字x的频率的信息.但是,我无法使垃圾箱和n的大小相同.

I want bins to be my x-data and n to be my y-data. That is, I want to extract info about how often number x occors vs. number x. However, I cannot get bins and n to be of the same size.

所以基本上,我希望能够将曲线拟合为n(bins,params).

这怎么办?

推荐答案

来自 matplotlib.pyplot.hist :

返回

n:数组或数组列表

直方图bin的值.有关可能的语义的描述,请参见normedweights.如果输入x是一个数组,则这是一个长度为nbins的数组.如果输入是一个序列数组[data1, data2,..],则这是一个数组列表,每个数组的直方图的值都以相同的顺序排列.

Returns

n : array or list of arrays

The values of the histogram bins. See normed and weights for a description of the possible semantics. If input x is an array, then this is an array of length nbins. If input is a sequence arrays [data1, data2,..], then this is a list of arrays with the values of the histograms for each of the arrays in the same order.

垃圾箱的边缘.长度nbins + 1(nbins最后一个容器的左边缘和右边缘).即使传入多个数据集,也始终是单个数组.

The edges of the bins. Length nbins + 1 (nbins left edges and right edge of last bin). Always a single array even when multiple data sets are passed in.

用于创建直方图的单个补丁的静默列表,或者如果有多个输入数据集,则为此类列表的列表.

Silent list of individual patches used to create the histogram or list of such list if multiple input datasets.

如您所见,第二个退货实际上是垃圾箱的边缘,因此它包含的物品多于垃圾箱.

As you can see the second return is actually the edges of the bins, so it contains one more item than there are bins.

获取垃圾箱中心的最简单方法是:

The easiest way to get the bin centers is:

import numpy as np
bin_center = bin_borders[:-1] + np.diff(bin_borders) / 2

只是增加一半的宽度(使用 np.diff )到左垃圾箱边界的两个边界(垃圾箱的宽度)之间.排除最后一个垃圾箱边界,因为它是最右边垃圾箱的右边界.

Which just adds half of the width (with np.diff) between two borders (width of the bins) to the left bin border. Excluding the last bin border because it's the right border of the rightmost bin.

因此,这实际上将返回bin中心-一个长度与n相同的数组.

So this will actually return the bin centers - an array with the same length as n.

请注意,如果您有 numba ,则可以加快边界到中心的计算速度:

Note that if you have numba you could speed up the borders-to-centers-calculation:

import numba as nb

@nb.njit
def centers_from_borders_numba(b):
    centers = np.empty(b.size - 1, np.float64)
    for idx in range(b.size - 1):
        centers[idx] = b[idx] + (b[idx+1] - b[idx]) / 2
    return centers

def centers_from_borders(borders):
    return borders[:-1] + np.diff(borders) / 2

速度要快很多

bins = np.random.random(100000)
bins.sort()

# Make sure they are identical
np.testing.assert_array_equal(centers_from_borders_numba(bins), centers_from_borders(bins))

# Compare the timings
%timeit centers_from_borders_numba(bins)
# 36.9 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit centers_from_borders(bins)
# 150 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

即使速度更快,numba也是一个很大的依赖关系,您不要轻易添加.但是,玩起来很有趣并且非常快,但是在下面的内容中,我将使用NumPy版本,因为它对大多数将来的访问者会更有用.

Even if it's faster numba is quite a heavy dependency that you don't add lightly. However it's fun to play around with and really fast, but in the following I'll use the NumPy version because it's will be more helpful for most future visitors.

对于将函数拟合到直方图的一般任务:您需要定义一个函数以拟合数据,然后可以使用

As for the general task of fitting a function to the histogram: You need to define a function to fit to the data and then you can use scipy.optimize.curve_fit. For example if you want to fit a Gaussian curve:

import numpy as np
import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

然后定义要拟合的函数和一些样本数据集.示例数据集仅用于此问题,您应该使用数据集并定义要适合的函数:

Then define the function to fit and some sample dataset. The sample dataset is just for the purpose of this question, you should use your dataset and define your function you want to fit:

def gaussian(x, mean, amplitude, standard_deviation):
    return amplitude * np.exp( - ((x - mean) / standard_deviation) ** 2)

x = np.random.normal(10, 5, size=10000)

拟合曲线并绘制曲线:

bin_heights, bin_borders, _ = plt.hist(x, bins='auto', label='histogram')
bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2
popt, _ = curve_fit(gaussian, bin_centers, bin_heights, p0=[1., 0., 1.])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)
plt.plot(x_interval_for_fit, gaussian(x_interval_for_fit, *popt), label='fit')
plt.legend()

请注意,您也可以使用NumPys histogram 和Matplotlibs bar -plot .区别在于np.histogram不返回补丁"数组,并且您需要Matplotlibs条形图的bin宽度:

Note that you can also use NumPys histogram and Matplotlibs bar-plot instead. The difference is that np.histogram doesn't return the "patches" array and that you need the bin-widths for Matplotlibs bar-plot:

bin_heights, bin_borders = np.histogram(x, bins='auto')
bin_widths = np.diff(bin_borders)
bin_centers = bin_borders[:-1] + bin_widths / 2
popt, _ = curve_fit(gaussian, bin_centers, bin_heights, p0=[1., 0., 1.])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)

plt.bar(bin_centers, bin_heights, width=bin_widths, label='histogram')
plt.plot(x_interval_for_fit, gaussian(x_interval_for_fit, *popt), label='fit', c='red')
plt.legend()

当然,您也可以将其他功能拟合到直方图中.我通常喜欢

Of course you can also fit other functions to your histograms. I generally like Astropys models for fitting, because you don't need to create the functions yourself and it also supports compound models and different fitters.

例如,使用Astropy将高斯曲线拟合到数据集:

For example to fit a Gaussian curve using Astropy to the data set:

from astropy.modeling import models, fitting

bin_heights, bin_borders = np.histogram(x, bins='auto')
bin_widths = np.diff(bin_borders)
bin_centers = bin_borders[:-1] + bin_widths / 2

t_init = models.Gaussian1D()
fit_t = fitting.LevMarLSQFitter()
t = fit_t(t_init, bin_centers, bin_heights)

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)
plt.figure()
plt.bar(bin_centers, bin_heights, width=bin_widths, label='histogram')
plt.plot(x_interval_for_fit, t(x_interval_for_fit), label='fit', c='red')
plt.legend()

仅通过替换以下内容即可为数据拟合其他模型:

Fitting a different model to the data is possible then just by replacing the:

t_init = models.Gaussian1D()

使用其他型号.例如, (类似于高斯,但尾部较宽):

with a different model. For example a Lorentz1D (like a Gaussian but a with wider tails):

t_init = models.Lorentz1D()

鉴于我的样本数据,并不是一个很好的模型,但是如果已经有一个可以满足需求的Astropy模型,那么使用它真的很容易.

Not exactly a good model given my sample data, but it's really easy to use if there's already an Astropy model that matches the needs.

这篇关于在Python中将曲线拟合为直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆