回归模型中成本函数的L1范数而不是L2范数 [英] L1 norm instead of L2 norm for cost function in regression model
问题描述
我想知道Python中是否有一个函数可以和 scipy.linalg.lstsq
做同样的工作,但是使用最小绝对偏差"回归而不是最小二乘"回归(OLS).我想使用 L1
规范,而不是 L2
规范.
实际上,我有3d点,我想要它们中最适合的平面.常见方法是通过最小二乘方法,例如Github
I was wondering if there's a function in Python that would do the same job as scipy.linalg.lstsq
but uses "least absolute deviations" regression instead of "least squares" regression (OLS). I want to use the L1
norm, instead of the L2
norm.
In fact, I have 3d points, which I want the best-fit plane of them. The common approach is by the least square method like this Github link. But It's known that this doesn't give the best fit always, especially when we have interlopers in our set of data. And it's better to calculate the least absolute deviation. The difference between the two methods is explained more here.
It'll not be solved by functions such as MAD since it's an Ax = b
matrix equations and requires loops to minimizes the results. I want to know if anyone knows of a relevant function in Python - probably in a linear algebra package - that would calculate "least absolute deviations" regression?
This is not so difficult to roll yourself, using scipy.optimize.minimize
and a custom cost_function
.
Let us first import the necessities,
from scipy.optimize import minimize
import numpy as np
And define a custom cost function (and a convenience wrapper for obtaining the fitted values),
def fit(X, params):
return X.dot(params)
def cost_function(params, X, y):
return np.sum(np.abs(y - fit(X, params)))
Then, if you have some X
(design matrix) and y
(observations), we can do the following,
output = minimize(cost_function, x0, args=(X, y))
y_hat = fit(X, output.x)
Where x0
is some suitable initial guess for the optimal parameters (you could take @JamesPhillips' advice here, and use the fitted parameters from an OLS approach).
In any case, when test-running with a somewhat contrived example,
X = np.asarray([np.ones((100,)), np.arange(0, 100)]).T
y = 10 + 5 * np.arange(0, 100) + 25 * np.random.random((100,))
I find,
fun: 629.4950595335436
hess_inv: array([[ 9.35213468e-03, -1.66803210e-04],
[ -1.66803210e-04, 1.24831279e-05]])
jac: array([ 0.00000000e+00, -1.52587891e-05])
message: 'Optimization terminated successfully.'
nfev: 144
nit: 11
njev: 36
status: 0
success: True
x: array([ 19.71326758, 5.07035192])
And,
fig = plt.figure()
ax = plt.axes()
ax.plot(y, 'o', color='black')
ax.plot(y_hat, 'o', color='blue')
plt.show()
With the fitted values in blue, and the data in black.
这篇关于回归模型中成本函数的L1范数而不是L2范数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!