将曲线拟合到 Python 中的直方图 [英] Fit a curve to a histogram in Python

查看:28
本文介绍了将曲线拟合到 Python 中的直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使曲线拟合 matplotlib 生成的直方图中的值:

n, bins, patch = plt.hist(myData)

其中plt"代表 matplotlib.pyplot,而 myData 是一个数组,每个索引的出现次数为 [9,3,3,....]

我希望 bin 是我的 x 数据,n 是我的 y 数据.也就是说,我想提取有关数字 x occors 与数字 x 发生频率的信息.但是,我无法让 bin 和 n 具有相同的大小.

所以基本上,我希望能够将曲线拟合到 n(bins, params).

如何做到这一点?

解决方案

来自

即使它更快,numba 也是一个非常严重的依赖项,您不会轻易添加.然而,玩起来很有趣,而且速度非常快,但在下面我将使用 NumPy 版本,因为它对大多数未来的访问者更有帮助.


至于将函数拟合到直方图的一般任务:您需要定义一个函数来拟合数据,然后您可以使用

请注意,您也可以使用 NumPys

当然,您也可以将其他函数拟合到直方图中.我通常喜欢

只需替换以下内容即可为数据拟合不同的模型:

t_init = models.Gaussian1D()

使用不同的模型.例如

考虑到我的示例数据,这不是一个很好的模型,但如果已经有符合需求的 Astropy 模型,它真的很容易使用.

I am trying to make to fit a curve to the values in a matplotlib generated histogram:

n, bins, patches = plt.hist(myData)

Where "plt" stands for matplotlib.pyplot and myData is an array with number of occurrences every index like [9,3,3,....]

I want bins to be my x-data and n to be my y-data. That is, I want to extract info about how often number x occors vs. number x. However, I cannot get bins and n to be of the same size.

So basically, I would like to be able to fit a curve to n(bins, params).

How would one do this?

解决方案

From the documentation of matplotlib.pyplot.hist:

Returns

n : array or list of arrays

The values of the histogram bins. See normed and weights for a description of the possible semantics. If input x is an array, then this is an array of length nbins. If input is a sequence arrays [data1, data2,..], then this is a list of arrays with the values of the histograms for each of the arrays in the same order.

bins : array

The edges of the bins. Length nbins + 1 (nbins left edges and right edge of last bin). Always a single array even when multiple data sets are passed in.

patches : list or list of lists

Silent list of individual patches used to create the histogram or list of such list if multiple input datasets.

As you can see the second return is actually the edges of the bins, so it contains one more item than there are bins.

The easiest way to get the bin centers is:

import numpy as np
bin_center = bin_borders[:-1] + np.diff(bin_borders) / 2

Which just adds half of the width (with np.diff) between two borders (width of the bins) to the left bin border. Excluding the last bin border because it's the right border of the rightmost bin.

So this will actually return the bin centers - an array with the same length as n.

Note that if you have numba you could speed up the borders-to-centers-calculation:

import numba as nb

@nb.njit
def centers_from_borders_numba(b):
    centers = np.empty(b.size - 1, np.float64)
    for idx in range(b.size - 1):
        centers[idx] = b[idx] + (b[idx+1] - b[idx]) / 2
    return centers

def centers_from_borders(borders):
    return borders[:-1] + np.diff(borders) / 2

It's quite a bit faster:

bins = np.random.random(100000)
bins.sort()

# Make sure they are identical
np.testing.assert_array_equal(centers_from_borders_numba(bins), centers_from_borders(bins))

# Compare the timings
%timeit centers_from_borders_numba(bins)
# 36.9 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit centers_from_borders(bins)
# 150 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Even if it's faster numba is quite a heavy dependency that you don't add lightly. However it's fun to play around with and really fast, but in the following I'll use the NumPy version because it's will be more helpful for most future visitors.


As for the general task of fitting a function to the histogram: You need to define a function to fit to the data and then you can use scipy.optimize.curve_fit. For example if you want to fit a Gaussian curve:

import numpy as np
import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

Then define the function to fit and some sample dataset. The sample dataset is just for the purpose of this question, you should use your dataset and define your function you want to fit:

def gaussian(x, mean, amplitude, standard_deviation):
    return amplitude * np.exp( - (x - mean)**2 / (2*standard_deviation ** 2))
    
x = np.random.normal(10, 5, size=10000)

Fitting the curve and plotting it:

bin_heights, bin_borders, _ = plt.hist(x, bins='auto', label='histogram')
bin_centers = bin_borders[:-1] + np.diff(bin_borders) / 2
popt, _ = curve_fit(gaussian, bin_centers, bin_heights, p0=[1., 0., 1.])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)
plt.plot(x_interval_for_fit, gaussian(x_interval_for_fit, *popt), label='fit')
plt.legend()

Note that you can also use NumPys histogram and Matplotlibs bar-plot instead. The difference is that np.histogram doesn't return the "patches" array and that you need the bin-widths for Matplotlibs bar-plot:

bin_heights, bin_borders = np.histogram(x, bins='auto')
bin_widths = np.diff(bin_borders)
bin_centers = bin_borders[:-1] + bin_widths / 2
popt, _ = curve_fit(gaussian, bin_centers, bin_heights, p0=[1., 0., 1.])

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)

plt.bar(bin_centers, bin_heights, width=bin_widths, label='histogram')
plt.plot(x_interval_for_fit, gaussian(x_interval_for_fit, *popt), label='fit', c='red')
plt.legend()

Of course you can also fit other functions to your histograms. I generally like Astropys models for fitting, because you don't need to create the functions yourself and it also supports compound models and different fitters.

For example to fit a Gaussian curve using Astropy to the data set:

from astropy.modeling import models, fitting

bin_heights, bin_borders = np.histogram(x, bins='auto')
bin_widths = np.diff(bin_borders)
bin_centers = bin_borders[:-1] + bin_widths / 2

t_init = models.Gaussian1D()
fit_t = fitting.LevMarLSQFitter()
t = fit_t(t_init, bin_centers, bin_heights)

x_interval_for_fit = np.linspace(bin_borders[0], bin_borders[-1], 10000)
plt.figure()
plt.bar(bin_centers, bin_heights, width=bin_widths, label='histogram')
plt.plot(x_interval_for_fit, t(x_interval_for_fit), label='fit', c='red')
plt.legend()

Fitting a different model to the data is possible then just by replacing the:

t_init = models.Gaussian1D()

with a different model. For example a Lorentz1D (like a Gaussian but a with wider tails):

t_init = models.Lorentz1D()

Not exactly a good model given my sample data, but it's really easy to use if there's already an Astropy model that matches the needs.

这篇关于将曲线拟合到 Python 中的直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆