使用来自scipy.optimize的curve_fit求解数据集的系数 [英] Solving coefficients of data set using curve_fit from scipy.optimize
问题描述
我有一个从excel导出的数组A
,其中包含如图所示的数据值.第一列x
和第二列y
是因变量,而第三列z
是自变量(输出).
I have an array A
exported from excel, containing data values as shown. 1st column x
and 2nd column y
are dependent variables, while 3rd column z
are independent variables (the output).
from xlrd import open_workbook
Data = open_workbook("simple.xls")
sheet = Data.sheet_by_name('Sheet1')
A=[]
# Read row by row
for rownum in range(sheet.nrows):
rowValues = sheet.row_values(rownum)
A.append(rowValues)
A = np.array(A)
A=
[[ 0.00000000e+00 1.49761692e-05 0.00000000e+00]
[ 8.85000000e+02 1.49761692e-05 6.41362500e-02]
[ 1.48500000e+03 1.49761692e-05 1.19340000e-01]
[ 2.09000000e+03 1.49761692e-05 1.58760000e-01]
[ 3.36000000e+03 1.49761692e-05 2.08080000e-01]
[ 3.87000000e+03 1.49761692e-05 2.16933750e-01]
[ 6.48000000e+03 1.49761692e-05 2.46746250e-01]
[ 8.22000000e+03 1.49761692e-05 2.54700000e-01]
[ 1.05300000e+04 1.49761692e-05 2.59470000e-01]
[ 1.58250000e+04 1.49761692e-05 2.62035000e-01]
[ 2.37600000e+04 1.49761692e-05 2.68751250e-01]
[ 8.18400000e+04 1.49761692e-05 2.92848750e-01]
[ 0.00000000e+00 8.57250668e-06 0.00000000e+00]
[ 6.75000000e+02 8.57250668e-06 4.97436412e-02]
[ 1.27500000e+03 8.57250668e-06 1.27749375e-01]
[ 1.88000000e+03 8.57250668e-06 1.88617039e-01]
[ 3.15000000e+03 8.57250668e-06 2.65089780e-01]
[ 3.66000000e+03 8.57250668e-06 2.90344849e-01]
[ 6.27000000e+03 8.57250668e-06 3.36295316e-01]
[ 8.01000000e+03 8.57250668e-06 3.42702439e-01]
[ 1.03200000e+04 8.57250668e-06 3.65205982e-01]
[ 1.56150000e+04 8.57250668e-06 3.67269626e-01]
[ 2.35500000e+04 8.57250668e-06 3.87296798e-01]
[ 8.16300000e+04 8.57250668e-06 4.43486869e-01]
[ 0.00000000e+00 4.26671486e-06 0.00000000e+00]
[ 4.65000000e+02 4.26671486e-06 2.61407250e-02]
[ 1.06500000e+03 4.26671486e-06 1.22371762e-01]
[ 1.67000000e+03 4.26671486e-06 2.19629475e-01]
[ 2.94000000e+03 4.26671486e-06 3.26680087e-01]
[ 3.45000000e+03 4.26671486e-06 3.34340662e-01]
[ 6.06000000e+03 4.26671486e-06 4.18330575e-01]
[ 7.80000000e+03 4.26671486e-06 4.50631350e-01]
[ 1.01100000e+04 4.26671486e-06 4.55053950e-01]
[ 1.54050000e+04 4.26671486e-06 4.60937587e-01]
[ 2.33400000e+04 4.26671486e-06 5.10770813e-01]
[ 8.14200000e+04 4.26671486e-06 6.12569587e-01]
[ 0.00000000e+00 2.13335743e-06 0.00000000e+00]
[ 8.55000000e+02 2.13335743e-06 1.03773150e-01]
[ 1.46000000e+03 2.13335743e-06 2.21130000e-01]
[ 2.73000000e+03 2.13335743e-06 3.45515625e-01]
[ 3.24000000e+03 2.13335743e-06 3.85634925e-01]
[ 5.85000000e+03 2.13335743e-06 4.76061300e-01]
[ 7.59000000e+03 2.13335743e-06 4.79220300e-01]
[ 1.51950000e+04 2.13335743e-06 5.24709900e-01]
[ 2.31300000e+04 2.13335743e-06 5.64829200e-01]
[ 8.12100000e+04 2.13335743e-06 6.46568325e-01]
[ 0.00000000e+00 1.42359023e-06 0.00000000e+00]
[ 6.45000000e+02 1.42359023e-06 8.03596500e-02]
[ 1.25000000e+03 1.42359023e-06 2.36700000e-01]
[ 2.52000000e+03 1.42359023e-06 4.25941650e-01]
[ 3.03000000e+03 1.42359023e-06 4.61683350e-01]
[ 5.64000000e+03 1.42359023e-06 5.99561100e-01]
[ 7.38000000e+03 1.42359023e-06 6.05952000e-01]
[ 9.69000000e+03 1.42359023e-06 6.16958550e-01]
[ 1.49850000e+04 1.42359023e-06 6.57434250e-01]
[ 2.29200000e+04 1.42359023e-06 6.45954300e-01]
[ 8.10000000e+04 1.42359023e-06 7.79689800e-01]
[ 0.00000000e+00 9.36010573e-07 0.00000000e+00]
[ 4.35000000e+02 9.36010573e-07 3.40200000e-02]
[ 1.04000000e+03 9.36010573e-07 1.91160000e-01]
[ 2.31000000e+03 9.36010573e-07 3.77640000e-01]
[ 2.82000000e+03 9.36010573e-07 4.44240000e-01]
[ 5.43000000e+03 9.36010573e-07 5.50440000e-01]
[ 7.17000000e+03 9.36010573e-07 5.36580000e-01]
[ 9.48000000e+03 9.36010573e-07 5.83740000e-01]
[ 1.47750000e+04 9.36010573e-07 5.87340000e-01]
[ 2.27100000e+04 9.36010573e-07 6.33060000e-01]
[ 8.07900000e+04 9.36010573e-07 7.36200000e-01]]
x= A[:,0]
y= A[:,1]
z= A[:,2]
我有一个函数适合数组A中的数据,以便求解系数a
和b
.
I have a function that would fit into the data from array A in order to solve for coefficients a
and b
.
def func(data,a,b):
return a/(data[:,1]*b)*np.log(1+(data[:,1]*b/a)*(1-np.exp(-a*data[:,0])))
代码的其余部分显示了系数a
和b
,scipy.optimize.curve_fit()
函数以及matplotlib.pyplot
的初始猜测,以绘制结果.
The rest of the code shows the initial guess of the coefficients a
and b
, the scipy.optimize.curve_fit()
function, and matplotlib.pyplot
to plot the result.
guess = [3.0e-5, 128 ]
print guess, 'initial guessed parameters'
params, pcov = scipy.optimize.curve_fit(func, A[:,:2], A[:,2], guess)
print params, 'fitted parameters'
import matplotlib.pyplot as plt
plt.plot(x,func(A,params[0],params[1]),'-r',x,z,'o')
plt.title('Plot')
plt.legend(['Fit', 'Data'], loc='lower right')
plt.show()
情节的结果是这个
结果系数为:
[3e-05, 128] initial guessed parameters
[ 2.00773153e-04 1.22752179e+02] fitted parameters
因为所有数据都在array
A内部,所以scipy
认为数组中的点从一个点连接到另一个点,从而导致每条曲线最终返回到原点,这也是的起点随后的曲线.
Because all the data is inside array
A, scipy
thinks that the points in the array joins from one point to another, resulting in the end each curve to go back to the origin, which is also the start of subsequent curves.
我应该如何在python
中进行编码,以使scipy.optimize.curve_fit
知道数组中的数据由多条曲线组成,而不是由一个单一的联合数据组成?任何建议将不胜感激.
How should I code in python
, such that scipy.optimize.curve_fit
knows that the data in the array consists of multiple curves, instead of it being one single conjoined data? Any advice would be greatly appreciated.
推荐答案
似乎您的数据集A
包含了所有背靠背的曲线.
It seems that your dataset A
contains all those curves back to back.
相反,您可以每次A[:,0] == 0.00000000e+00
拆分数据集.将其分为6个数据集后,您可以分别将它们拟合.
Instead, you could split your dataset every time A[:,0] == 0.00000000e+00
. After splitting it into 6 datasets, you could fit to each separately.
但是,如果我正确理解了您的问题,那么您还希望每个数据集的参数a
和b
都相同,对吗?
But if I understand your problem correctly, you would also like the parameters a
and b
to be the same for every dataset, correct?
为了帮助您实现这一目标,我将无耻地插入我的 symfit
程序包包装curve_fit
可以使这些问题更容易解决.
In order to help you achieve that, I'm going to shamelessly plug my symfit
package, which wraps curve_fit
to make such problems easier to solve.
在 symfit
中,您将执行以下操作:
In symfit
, you would do the following::
from symfit import Fit, variables, parameters, log, exp
datasets = [A_1, A_2, ...] # I'm going to assume this holds the untangled datasets one through six
xs = variables('x_1, x_2, x_3, x_4, x_5, x_6')
ys = variables('y_1, y_2, y_3, y_4, y_5, y_6')
zs = variables('z_1, ...') # same for z
a, b = parameters('a, b')
model_dict = {
z: a/(y * b) * log(1 + (y * b/a) * (1 - exp(- a * x)))
for x, y, z in zip(xs, ys, zs)
}
此代码将创建一个向量值模型,该模型将允许您同时适合此方程组(每个中都具有a
和b
的相同实例!).为了适合,我们现在可以简单地执行以下操作:
This code will create a vector valued model which will allow you to fit to this system of equations simultaneously (With the same instance of a
and b
in each!). In order to fit, we can now simply do the following:
fit = Fit(model_dict,
x_1=datasets[0][:,0], x_2=datasets[1][:,0], ...,
y_1=datasets[0][:,1], y_2=datasets[1][:,1], ...,
z_1=datasets[0][:,2], z_2=datasets[1][:,2], ...
)
我没有将所有内容全部写完整,但是我希望这能使您了解如何完成此操作.可以在文档中找到更多信息:符号文档.
I didn't write everything out in full but I hope this gives you an idea of how to complete this. More info can be found in the docs: symfit docs.
最后,请注意,我使用的是符号exp和log,而不是numpy的.
As a final remark, note that I have used a symbolic exp and log, not numpy's.
这篇关于使用来自scipy.optimize的curve_fit求解数据集的系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!