线性回归vs封闭形式Python中的普通最小二乘法 [英] Linear Regression vs Closed form Ordinary least squares in Python

查看:83
本文介绍了线性回归vs封闭形式Python中的普通最小二乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python将线性回归方法应用于具有约50个特征的9个样本的数据集.我尝试了不同的线性回归方法,即闭合形式OLS(普通最小二乘),LR(线性回归),HR(Huber回归),NNLS(非负最小二乘),并且它们各自赋予不同的权重.

I am trying to apply Linear Regression method for a dataset of 9 sample with around 50 features using python. I have tried different methodology for Linear Regression i.e Closed form OLS(Ordinary Least Squares), LR(Linear Regression), HR(Huber Regression), NNLS( Non negative least squares) and each of them gives different weights.

但是我可以理解为什么HR和NNLS具有不同的解决方案,但是LR和封闭形式OLS具有相同的目标功能,即最小化给定样本中观察值与被预测值之间的差异平方和.一组功能的线性函数.由于训练集是奇异的,所以我不得不使用伪逆来执行封闭形式的OLS.

But I can get the intuition why HR and NNLS has different solution, but LR and Closed form OLS have the same objective function of minimizing the sum of the squares of the differences between observed value in the given sample and those predicted by a linear function of a set of features. Since the training set is singular, i had to use pseudoinverse to perform Closed form OLS.

w = np.dot(train_features.T, train_features)  
w1 = np.dot(np.linalg.pinv(w), np.dot(train_features.T,train_target))

对于LR,我已经使用scikit-learn线性回归使用了 www.netlib.org 中的lapack库来解决最小二乘问题

For LR i have used scikit-learn Linear Regression uses lapack library from www.netlib.org to solve the least-squares problem

       linear_model.LinearRegression()

如果没有可用的方程式小于未知参数,则将线性方程式系统或多项式方程式系统称为不确定的.每个未知参数都可以算作可用的自由度.提出的每个方程式都可以用作限制一个自由度的约束.结果,一个待定的系统可能有无限多个解决方案,或者根本没有解决方案.由于在我们的案例研究中,系统还不确定,而且系统很单一,因此存在许多解决方案.

System of linear equations or a system of polynomial equations is referred as underdetermined if no of equations available are less than unknown parameters. Each unknown parameter can be counted as an available degree of freedom. Each equation presented can be applied as a constraint that restricts one degree of freedom. As a result an underdetermined system can have infinitely many solutions or no solution at all. Since in our case study, system is underdetermined and also is singular, there exists many solutions.

现在,当样本数不小于特征数时,伪逆数和Lapack库都试图找到欠定系统的最小范数解.那么为什么封闭形式和LR给出了同一线性方程组系统的完全不同的解.我是否在这里缺少可以解释两种方式的行为的信息.就像伪逆是以SVD,QR/LQ因式分解等不同方式计算一样,它们是否可以为同一组方程产生不同的解?

Now both pseudoinverse and Lapack library tries to finds minimum norm solution of an underdetermined system when no of sample is less than no of features. Then why the closed form and LR gives completely different solution of the same system of linear equations. Am i missing something here which can explain the behaviors of both ways. Like if the peudoinverse is computed in different ways like SVD, QR/LQ factorization, can they produce different solution for same set of equations?

推荐答案

查看 默认情况下(如您所说的那样),它也适合拦截条件!

演示:

import numpy as np
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

X, y = load_boston(return_X_y=True)

""" OLS custom """
w = np.dot(np.linalg.pinv(X), y)
print('custom')
print(w)

""" sklearn's LinearRegression (default) """
clf = LinearRegression()
print('sklearn default')
print(clf.fit(X, y).coef_)


""" sklearn's LinearRegression (no intercept-fitting) """
print('sklearn fit_intercept=False')
clf = LinearRegression(fit_intercept=False)
print(clf.fit(X, y).coef_)

输出:

custom
[ -9.16297843e-02   4.86751203e-02  -3.77930006e-03   2.85636751e+00
  -2.88077933e+00   5.92521432e+00  -7.22447929e-03  -9.67995240e-01
   1.70443393e-01  -9.38925373e-03  -3.92425680e-01   1.49832102e-02
  -4.16972624e-01]
sklearn default
[ -1.07170557e-01   4.63952195e-02   2.08602395e-02   2.68856140e+00
  -1.77957587e+01   3.80475246e+00   7.51061703e-04  -1.47575880e+00
   3.05655038e-01  -1.23293463e-02  -9.53463555e-01   9.39251272e-03
  -5.25466633e-01]
sklearn fit_intercept=False
[ -9.16297843e-02   4.86751203e-02  -3.77930006e-03   2.85636751e+00
  -2.88077933e+00   5.92521432e+00  -7.22447929e-03  -9.67995240e-01
   1.70443393e-01  -9.38925373e-03  -3.92425680e-01   1.49832102e-02
  -4.16972624e-01]

这篇关于线性回归vs封闭形式Python中的普通最小二乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆