使用来自scipy.optimize的curve_fit求解数据集的系数 [英] Solving coefficients of data set using curve_fit from scipy.optimize

查看:111
本文介绍了使用来自scipy.optimize的curve_fit求解数据集的系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从excel导出的数组A,其中包含如图所示的数据值.第一列x和第二列y是因变量,而第三列z是自变量(输出).

I have an array A exported from excel, containing data values as shown. 1st column x and 2nd column y are dependent variables, while 3rd column z are independent variables (the output).

from xlrd import open_workbook

Data = open_workbook("simple.xls")
sheet = Data.sheet_by_name('Sheet1')

A=[]

# Read row by row
for rownum in range(sheet.nrows):
    rowValues = sheet.row_values(rownum)
    A.append(rowValues)

A = np.array(A)

A=
[[  0.00000000e+00   1.49761692e-05   0.00000000e+00]
 [  8.85000000e+02   1.49761692e-05   6.41362500e-02]
 [  1.48500000e+03   1.49761692e-05   1.19340000e-01]
 [  2.09000000e+03   1.49761692e-05   1.58760000e-01]
 [  3.36000000e+03   1.49761692e-05   2.08080000e-01]
 [  3.87000000e+03   1.49761692e-05   2.16933750e-01]
 [  6.48000000e+03   1.49761692e-05   2.46746250e-01]
 [  8.22000000e+03   1.49761692e-05   2.54700000e-01]
 [  1.05300000e+04   1.49761692e-05   2.59470000e-01]
 [  1.58250000e+04   1.49761692e-05   2.62035000e-01]
 [  2.37600000e+04   1.49761692e-05   2.68751250e-01]
 [  8.18400000e+04   1.49761692e-05   2.92848750e-01]
 [  0.00000000e+00   8.57250668e-06   0.00000000e+00]
 [  6.75000000e+02   8.57250668e-06   4.97436412e-02]
 [  1.27500000e+03   8.57250668e-06   1.27749375e-01]
 [  1.88000000e+03   8.57250668e-06   1.88617039e-01]
 [  3.15000000e+03   8.57250668e-06   2.65089780e-01]
 [  3.66000000e+03   8.57250668e-06   2.90344849e-01]
 [  6.27000000e+03   8.57250668e-06   3.36295316e-01]
 [  8.01000000e+03   8.57250668e-06   3.42702439e-01]
 [  1.03200000e+04   8.57250668e-06   3.65205982e-01]
 [  1.56150000e+04   8.57250668e-06   3.67269626e-01]
 [  2.35500000e+04   8.57250668e-06   3.87296798e-01]
 [  8.16300000e+04   8.57250668e-06   4.43486869e-01]
 [  0.00000000e+00   4.26671486e-06   0.00000000e+00]
 [  4.65000000e+02   4.26671486e-06   2.61407250e-02]
 [  1.06500000e+03   4.26671486e-06   1.22371762e-01]
 [  1.67000000e+03   4.26671486e-06   2.19629475e-01]
 [  2.94000000e+03   4.26671486e-06   3.26680087e-01]
 [  3.45000000e+03   4.26671486e-06   3.34340662e-01]
 [  6.06000000e+03   4.26671486e-06   4.18330575e-01]
 [  7.80000000e+03   4.26671486e-06   4.50631350e-01]
 [  1.01100000e+04   4.26671486e-06   4.55053950e-01]
 [  1.54050000e+04   4.26671486e-06   4.60937587e-01]
 [  2.33400000e+04   4.26671486e-06   5.10770813e-01]
 [  8.14200000e+04   4.26671486e-06   6.12569587e-01]
 [  0.00000000e+00   2.13335743e-06   0.00000000e+00]
 [  8.55000000e+02   2.13335743e-06   1.03773150e-01]
 [  1.46000000e+03   2.13335743e-06   2.21130000e-01]
 [  2.73000000e+03   2.13335743e-06   3.45515625e-01]
 [  3.24000000e+03   2.13335743e-06   3.85634925e-01]
 [  5.85000000e+03   2.13335743e-06   4.76061300e-01]
 [  7.59000000e+03   2.13335743e-06   4.79220300e-01]
 [  1.51950000e+04   2.13335743e-06   5.24709900e-01]
 [  2.31300000e+04   2.13335743e-06   5.64829200e-01]
 [  8.12100000e+04   2.13335743e-06   6.46568325e-01]
 [  0.00000000e+00   1.42359023e-06   0.00000000e+00]
 [  6.45000000e+02   1.42359023e-06   8.03596500e-02]
 [  1.25000000e+03   1.42359023e-06   2.36700000e-01]
 [  2.52000000e+03   1.42359023e-06   4.25941650e-01]
 [  3.03000000e+03   1.42359023e-06   4.61683350e-01]
 [  5.64000000e+03   1.42359023e-06   5.99561100e-01]
 [  7.38000000e+03   1.42359023e-06   6.05952000e-01]
 [  9.69000000e+03   1.42359023e-06   6.16958550e-01]
 [  1.49850000e+04   1.42359023e-06   6.57434250e-01]
 [  2.29200000e+04   1.42359023e-06   6.45954300e-01]
 [  8.10000000e+04   1.42359023e-06   7.79689800e-01]
 [  0.00000000e+00   9.36010573e-07   0.00000000e+00]
 [  4.35000000e+02   9.36010573e-07   3.40200000e-02]
 [  1.04000000e+03   9.36010573e-07   1.91160000e-01]
 [  2.31000000e+03   9.36010573e-07   3.77640000e-01]
 [  2.82000000e+03   9.36010573e-07   4.44240000e-01]
 [  5.43000000e+03   9.36010573e-07   5.50440000e-01]
 [  7.17000000e+03   9.36010573e-07   5.36580000e-01]
 [  9.48000000e+03   9.36010573e-07   5.83740000e-01]
 [  1.47750000e+04   9.36010573e-07   5.87340000e-01]
 [  2.27100000e+04   9.36010573e-07   6.33060000e-01]
 [  8.07900000e+04   9.36010573e-07   7.36200000e-01]]

x= A[:,0]
y= A[:,1]
z= A[:,2]

我有一个函数适合数组A中的数据,以便求解系数ab.

I have a function that would fit into the data from array A in order to solve for coefficients a and b.

def func(data,a,b):
    return a/(data[:,1]*b)*np.log(1+(data[:,1]*b/a)*(1-np.exp(-a*data[:,0]))) 

代码的其余部分显示了系数abscipy.optimize.curve_fit()函数以及matplotlib.pyplot的初始猜测,以绘制结果.

The rest of the code shows the initial guess of the coefficients a and b, the scipy.optimize.curve_fit() function, and matplotlib.pyplot to plot the result.

guess = [3.0e-5, 128 ]  

print guess, 'initial guessed parameters' 

params, pcov = scipy.optimize.curve_fit(func, A[:,:2], A[:,2], guess)

print params, 'fitted parameters' 

import matplotlib.pyplot as plt 
plt.plot(x,func(A,params[0],params[1]),'-r',x,z,'o') 
plt.title('Plot') 
plt.legend(['Fit', 'Data'], loc='lower right')
plt.show()

情节的结果是这个

结果系数为:

[3e-05, 128] initial guessed parameters
[  2.00773153e-04   1.22752179e+02] fitted parameters

因为所有数据都在array A内部,所以scipy认为数组中的点从一个点连接到另一个点,从而导致每条曲线最终返回到原点,这也是的起点随后的曲线.

Because all the data is inside arrayA, scipy thinks that the points in the array joins from one point to another, resulting in the end each curve to go back to the origin, which is also the start of subsequent curves.

我应该如何在python中进行编码,以使scipy.optimize.curve_fit知道数组中的数据由多条曲线组成,而不是由一个单一的联合数据组成?任何建议将不胜感激.

How should I code in python , such that scipy.optimize.curve_fit knows that the data in the array consists of multiple curves, instead of it being one single conjoined data? Any advice would be greatly appreciated.

推荐答案

似乎您的数据集A包含了所有背靠背的曲线.

It seems that your dataset A contains all those curves back to back.

相反,您可以每次A[:,0] == 0.00000000e+00拆分数据集.将其分为6个数据集后,您可以分别将它们拟合.

Instead, you could split your dataset every time A[:,0] == 0.00000000e+00. After splitting it into 6 datasets, you could fit to each separately.

但是,如果我正确理解了您的问题,那么您还希望每个数据集的参数ab都相同,对吗?

But if I understand your problem correctly, you would also like the parameters a and b to be the same for every dataset, correct?

为了帮助您实现这一目标,我将无耻地插入我的 symfit 程序包包装curve_fit可以使这些问题更容易解决.

In order to help you achieve that, I'm going to shamelessly plug my symfit package, which wraps curve_fit to make such problems easier to solve.

symfit 中,您将执行以下操作:

In symfit, you would do the following::

from symfit import Fit, variables, parameters, log, exp

datasets = [A_1, A_2, ...] # I'm going to assume this holds the untangled datasets one through six

xs = variables('x_1, x_2, x_3, x_4, x_5, x_6')
ys = variables('y_1, y_2, y_3, y_4, y_5, y_6')
zs = variables('z_1, ...') # same for z
a, b = parameters('a, b')

model_dict = {
    z: a/(y * b) * log(1 + (y * b/a) * (1 - exp(- a * x))) 
        for x, y, z in zip(xs, ys, zs) 
}

此代码将创建一个向量值模型,该模型将允许您同时适合此方程组(每个中都具有ab的相同实例!).为了适合,我们现在可以简单地执行以下操作:

This code will create a vector valued model which will allow you to fit to this system of equations simultaneously (With the same instance of a and b in each!). In order to fit, we can now simply do the following:

fit = Fit(model_dict, 
     x_1=datasets[0][:,0], x_2=datasets[1][:,0], ..., 
     y_1=datasets[0][:,1], y_2=datasets[1][:,1], ..., 
     z_1=datasets[0][:,2], z_2=datasets[1][:,2], ...
)

我没有将所有内容全部写完整,但是我希望这能使您了解如何完成此操作.可以在文档中找到更多信息:符号文档.

I didn't write everything out in full but I hope this gives you an idea of how to complete this. More info can be found in the docs: symfit docs.

最后,请注意,我使用的是符号exp和log,而不是numpy的.

As a final remark, note that I have used a symbolic exp and log, not numpy's.

这篇关于使用来自scipy.optimize的curve_fit求解数据集的系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆