将Pandas DataFrame传递给Scipy.optimize.curve_fit [英] Pass Pandas DataFrame to Scipy.optimize.curve_fit

查看：94 发布时间：2020/5/6 11:54:49 python pandas scipy mathematical-optimization model-fitting

本文介绍了将Pandas DataFrame传递给Scipy.optimize.curve_fit的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道使用Scipy适应Pandas DataFrame列的最佳方法.如果我有一个数据表(Pandas DataFrame)的列(A，B，C，D和Z_real)，其中Z取决于A，B，C和D，我想适合一个预测Z(Z_pred)的每个DataFrame行(系列)的函数.

I'd like to know the best way to use Scipy to fit Pandas DataFrame columns. If I have a data table (Pandas DataFrame) with columns (A, B, C, D and Z_real) where Z depends on A, B, C and D, I want to fit a function of each DataFrame row (Series) which makes a prediction for Z (Z_pred).

每个要适合的功能的签名是

The signature of each function to fit is

func(series, param_1, param_2...)

其中，系列是与DataFrame的每一行相对应的Pandas系列.我使用Pandas系列，以便不同的功能可以使用不同的列组合.

where series is the Pandas Series corresponding to each row of the DataFrame. I use the Pandas Series so that different functions can use different combinations of columns.

我尝试使用

curve_fit(func, table, table.loc[:, 'Z_real'])

但是由于某种原因，每个func实例都将整个数据表作为其第一个参数传递，而不是将每一行的Series传递给它.我也尝试过将DataFrame转换为Series对象的列表，但是这导致我的函数传递了一个Numpy数组(我想是因为Scipy进行了从Series列表到Numpy数组的转换，这并没有保留熊猫系列对象).

but for some reason each func instance is passed the whole datatable as its first argument rather than the Series for each row. I've also tried converting the DataFrame to a list of Series objects, but this results in my function being passed a Numpy array (I think because Scipy performs a conversion from a list of Series to a Numpy array which doesn't preserve the Pandas Series object).

推荐答案

您对curve_fit的调用不正确.来自文档:

Your call to curve_fit is incorrect. From the documentation:

xdata :具有k个预测变量的函数的M长度序列或(k，M)形数组.

测量数据的自变量.

ydata : M长度序列

从属数据-名义上为f(xdata，...)

The dependent data — nominally f(xdata, ...)

在这种情况下，您的因变量 xdata是列A到D，即table[['A', 'B', 'C', 'D']]，而您的因变量 ydata是table['Z_real'].

In this case your independent variables xdata are the columns A to D, i.e. table[['A', 'B', 'C', 'D']], and your dependent variable ydata is table['Z_real'].

还要注意，xdata应该是一个(k，M)数组，其中 k 是预测变量(即列)的数量，而 M 是观测值(即行)的数量.因此，您应该对输入数据帧进行转置，以使其为(4，M)，而不是(M，4)，即table[['A', 'B', 'C', 'D']].T.

Also note that xdata should be a (k, M) array, where k is the number of predictor variables (i.e. columns) and M is the number of observations (i.e. rows). You should therefore transpose your input dataframe so that it is (4, M) rather than (M, 4), i.e. table[['A', 'B', 'C', 'D']].T.

对curve_fit的整个调用可能看起来像这样:

The whole call to curve_fit might look something like this:

curve_fit(func, table[['A', 'B', 'C', 'D']].T, table['Z_real'])

这是显示多元线性回归的完整示例:

Here's a complete example showing multiple linear regression:

import numpy as np
import pandas as pd
from scipy.optimize import curve_fit

X = np.random.randn(100, 4)     # independent variables
m = np.random.randn(4)          # known coefficients
y = X.dot(m)                    # dependent variable

df = pd.DataFrame(np.hstack((X, y[:, None])),
                  columns=['A', 'B', 'C', 'D', 'Z_real'])

def func(X, *params):
    return np.hstack(params).dot(X)

popt, pcov = curve_fit(func, df[['A', 'B', 'C', 'D']].T, df['Z_real'],
                       p0=np.random.randn(4))

print(np.allclose(popt, m))
# True

这篇关于将Pandas DataFrame传递给Scipy.optimize.curve_fit的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将Pandas DataFrame传递给Scipy.optimize.curve_fit [英] Pass Pandas DataFrame to Scipy.optimize.curve_fit

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将Pandas DataFrame传递给Scipy.optimize.curve_fit [英] Pass Pandas DataFrame to Scipy.optimize.curve_fit

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭