在Python上对每个系数具有特定约束的多重线性回归 [英] Multiple Linear Regression with specific constraint on each coefficients on Python

查看:325
本文介绍了在Python上对每个系数具有特定约束的多重线性回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在数据集上运行多元线性回归.起初,我没有意识到我需要限制自己的体重;事实上,我需要有明确的肯定和负重.

I am currently running multiple linear regression on a dataset. At first, I didn't realize I needed to put constraints over my weights; as a matter of fact, I need to have specific positive & negative weights.

更准确地说,我正在做一个评分系统,这就是为什么我的一些变量对音符产生正面或负面影响的原因.但是,在运行模型时,结果与我的预期不符,我的一些正"变量得到了负系数,反之亦然.

To be more precise, I am doing a scoring system and this is why some of my variables should have a positive or negative impact on the note. Yet, when running my model, the results do not fit what I am expecting, some of my 'positive' variables get negative coefficients and vice versa.

作为一个例子,让我们假设我的模型是:

As an example, let's suppose my model is :

     y = W0*x0 + W1*x1 + W2*x2 

如果x2是一个正"变量,我想对W2施加一个约束,使其为正!

Where x2 is a 'positive' variable, I would like to put a constraint over W2 to be positive !

我一直在围绕这个问题进行很多研究,但是我没有发现关于特定权重/系数约束的任何信息,我发现的只是将所有系数设置为正或将它们求和为一个.

I have been looking around a lot about this issue but I've not found anything about constraints on specific weights/coefficients, all that I've found is about setting all coefficients positive or summing them to one.

我正在使用ScikitLearn软件包开发Python.这就是我获得最佳模型的方式:

I am working on Python using the ScikitLearn packages. This is how I get my best model :

    def ridge(Xtrain, Xtest, Ytrain, Ytest, position):
        param_grid={'alpha':[0.01 , 0.1, 1, 10, 50, 100, 1000]}
        gs = grid_search.GridSearchCV(Ridge(), param_grid=param_grid, n_jobs=-1, cv=3)
        gs.fit(Xtrain, Ytrain)
        hatytrain = gs.predict(Xtrain)
        hatytest = gs.predict(Xtest)

关于如何对特定变量的系数分配约束的任何想法?定义每个约束可能会很麻烦,但我不知道该怎么做.

Any idea of how I could assign a constraint on the coefficient of a specific variable ? Probably going to be burdensome to define each constraint but I have no idea how to do otherwise.

谢谢!

NB:我仍然是编码的初学者:)

NB : I am still a beginner at coding :)

推荐答案

Scikit-learn不允许对系数进行此类约束.

Scikit-learn does not allow such constraints on the coefficients.

但是,如果您实施自己的估算器,则可以对系数施加任何约束,并通过坐标下降来优化损耗.在无约束的情况下,坐标下降以合理的迭代次数产生与OLS相同的结果.

But you can impose any constraints on coefficients and optimize the loss with coordinate descent if you implement your own estimator. In the unconstraint case, coordinate descent produces the same result as OLS in reasonable number of iterations.

我编写了一个对LinearRegression系数施加上限和下限的类.如果需要,可以将其扩展为使用Ridge或evel Lasso惩罚:

I've written a class that imposes upper and lower bounds on LinearRegression coefficients. You can extend it to use Ridge or evel Lasso penalty if you want:

from sklearn.linear_model.base import LinearModel
from sklearn.base import RegressorMixin
from sklearn.utils import check_X_y
import numpy as np

class ConstrainedLinearRegression(LinearModel, RegressorMixin):

    def __init__(self, fit_intercept=True, normalize=False, copy_X=True, nonnegative=False, tol=1e-15):
        self.fit_intercept = fit_intercept
        self.normalize = normalize
        self.copy_X = copy_X
        self.nonnegative = nonnegative
        self.tol = tol

    def fit(self, X, y, min_coef=None, max_coef=None):
        X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'], y_numeric=True, multi_output=False)
        X, y, X_offset, y_offset, X_scale = self._preprocess_data(
            X, y, fit_intercept=self.fit_intercept, normalize=self.normalize, copy=self.copy_X)
        self.min_coef_ = min_coef if min_coef is not None else np.repeat(-np.inf, X.shape[1])
        self.max_coef_ = max_coef if max_coef is not None else np.repeat(np.inf, X.shape[1])
        if self.nonnegative:
            self.min_coef_ = np.clip(self.min_coef_, 0, None)

        beta = np.zeros(X.shape[1]).astype(float)
        prev_beta = beta + 1
        hessian = np.dot(X.transpose(), X)
        while not (np.abs(prev_beta - beta)<self.tol).all():
            prev_beta = beta.copy()
            for i in range(len(beta)):
                grad = np.dot(np.dot(X,beta) - y, X)
                beta[i] = np.minimum(self.max_coef_[i], 
                                     np.maximum(self.min_coef_[i], 
                                                beta[i]-grad[i] / hessian[i,i]))

        self.coef_ = beta
        self._set_intercept(X_offset, y_offset, X_scale)
        return self    

例如,您可以使用此类使所有系数为非负数

You can use this class, for example, to make all coefficients non-negative

from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
X, y = load_boston(return_X_y=True)
model = ConstrainedLinearRegression(nonnegative=True)
model.fit(X, y)
print(model.intercept_)
print(model.coef_)

这会产生类似

-36.99292986145538
[0.         0.05286515 0.         4.12512386 0.         8.04017956
 0.         0.         0.         0.         0.         0.02273805
 0.        ]

您可以看到大多数系数为零.普通的LinearModel会使它们为负数:

You can see that most coefficients are zero. An ordinary LinearModel would have made them negative:

model = LinearRegression()
model.fit(X, y)
print(model.intercept_)
print(model.coef_)

会返回给您

36.49110328036191
[-1.07170557e-01  4.63952195e-02  2.08602395e-02  2.68856140e+00
 -1.77957587e+01  3.80475246e+00  7.51061703e-04 -1.47575880e+00
  3.05655038e-01 -1.23293463e-02 -9.53463555e-01  9.39251272e-03
 -5.25466633e-01]

您还可以对选择的任何系数施加任意范围-这就是您要的.例如,在此设置中

You can also impose arbitrary bounds for any coefficients you choose - that's what you asked for. For example, in this setup

model = ConstrainedLinearRegression()
min_coef = np.repeat(-np.inf, X.shape[1])
min_coef[0] = 0
min_coef[4] = -1
max_coef = np.repeat(4, X.shape[1])
max_coef[3] = 2
model.fit(X, y, max_coef=max_coef, min_coef=min_coef)
print(model.intercept_)
print(model.coef_)

您将获得输出

24.060175576410515
[ 0.          0.04504673 -0.0354073   2.         -1.          4.
 -0.01343263 -1.17231216  0.2183103  -0.01375266 -0.7747823   0.01122374
 -0.56678676]

更新.该解决方案可以适于对系数的线性组合(例如,它们的和)进行约束,在这种情况下,将在每个步骤上重新计算每个系数的约束. 此Github要点提供了一个示例.

Update. This solution can be adapted to work with constraints on linear combinations of the coeffitients (e.g. their sum) - in this case, individual constraints for each coefficient would be recalculated on each step. This Github gist provides an example.

这篇关于在Python上对每个系数具有特定约束的多重线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆