在pandas数据框上进行Python曲线拟合,然后将coef添加到新列 [英] Python curve fitting on pandas dataframe then add coef to new columns
问题描述
我有一个数据行,需要对每行进行曲线拟合(二阶多项式).
I have a dataframe that needs to be curve fitted per row (second order polynomial).
共有四列,每列名称表示x值.
There are four columns, each column name denotes the x value.
每行包含4个 y
值,它们对应于列名称中的 x
值.
Each row contains 4 y
values corresponding to the x
values in the column name.
例如:根据以下代码,第一行的拟合将采用 x = [2,5,8,12]
和 y = [5.91,28.06,67.07,145.20]
For example:
Based on the code below, The fitting for the first row will take x = [2, 5, 8, 12]
and y = [5.91, 28.06, 67.07, 145.20]
import numpy as np
import panda as pd
df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
'id2': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
'x': [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12],
'y': [5.91, 4.43, 5.22, 1.31, 4.42, 3.65, 4.45, 1.70, 3.94, 3.29, 28.06, 19.51, 23.30, 4.20, 18.61, 17.60, 18.27, 16.18, 16.81, 16.37, 67.07, 46.00, 54.95, 43.66, 42.70, 41.32, 12.69, 36.75, 41.36, 38.66, 145.20, 118.34, 16.74, 94.10, 93.45, 86.60, 26.17, 77.12, 91.42, 83.11]})
pivot_df = df.pivot_table(index=['id','id2'],columns=['x'])
[output]
>>> pivot_df
y
x 2 5 8 12
id id2
1 A 5.91 28.06 67.07 145.20
B 3.65 17.60 41.32 86.60
2 A 4.43 19.51 46.00 118.34
B 4.45 18.27 12.69 26.17
3 A 5.22 23.30 54.95 16.74
B 1.70 16.18 36.75 77.12
4 A 1.31 4.20 43.66 94.10
B 3.94 16.81 41.36 91.42
5 A 4.42 16.37 42.70 93.45
B 3.29 18.61 38.66 83.11
我想执行曲线拟合而不要显式地对行进行迭代,以利用内置在熊猫数据框中的高性能引擎盖下迭代.我不确定该怎么做.
I want to perform the curve fitting without explicitly iterating over the rows in order to make use of the high performance under-the-hood iterating built into pandas' dataframes. I am not sure how to do so.
我编写了代码以循环遍历所有行以显示所需的输出.尽管下面的代码可以正常工作并提供所需的输出,但是我需要帮助使其更加简洁/高效.
I wrote the code to loop through the rows anyway to show the desired output. Although the code below does work and provides the desired output, I need help in making it more concise/efficient.
my_coef_array = np.zeros(3)
#get the x values from the column names
x = pivot_df.columns.get_level_values(pivot_df.columns.names.index('x')).values
for index in pivot_df.index:
my_coef_array = np.vstack((my_coef_array,np.polyfit(x, pivot_df.loc[index].values, 2)))
my_coef_array = my_coef_array[1:,:]
pivot_df['m2'] = my_coef_array[:,0]
pivot_df['m1'] = my_coef_array[:,1]
pivot_df['c'] = my_coef_array[:,2]
[output]
>>> pivot_df
y m2 m1 c
x 2 5 8 12
id id2
1 A 5.91 28.06 67.07 145.20 0.934379 0.848422 0.471170
B 3.65 17.60 41.32 86.60 0.510664 1.156009 -0.767408
2 A 4.43 19.51 46.00 118.34 1.034594 -3.221912 7.518221
B 4.45 18.27 12.69 26.17 -0.015300 2.045216 2.496306
3 A 5.22 23.30 54.95 16.74 -1.356997 20.827407 -35.130416
B 1.70 16.18 36.75 77.12 0.410485 1.772052 -3.345097
4 A 1.31 4.20 43.66 94.10 0.803630 -1.577705 -1.148066
B 3.94 16.81 41.36 91.42 0.631377 -0.085651 1.551586
5 A 4.42 16.37 42.70 93.45 0.659044 -0.278738 2.068114
B 3.29 18.61 38.66 83.11 0.478171 1.218486 -0.638888
推荐答案
我找到了以下 numpy.polynomial.polynomial.polyfit
,它是 np.polyfit
的替代方法需要一个二维数组作为y.
I found the following numpy.polynomial.polynomial.polyfit
which is an alternative to np.polyfit
that takes a 2-D array for y.
从x开始您的代码,我得到以下信息:
Starting your code from x, I get the following:
my_coef_array = pd.DataFrame(np.polynomial.polynomial.polyfit(x, pivot_df.T.values, 2)).T
my_coef_array.index = pivot_df.index
my_coef_array.columns = ['c', 'm1', 'm2']
pivot_df = pivot_df.join(my_coef_array)
这篇关于在pandas数据框上进行Python曲线拟合,然后将coef添加到新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!