线性回归对系数有约束 [英] Linear regression with constraints on the coefficients

查看:469
本文介绍了线性回归对系数有约束的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对像这样的模型进行线性回归:

I am trying to perform linear regression, for a model like this:

Y = aX1 + bX2 + c

所以Y ~ X1 + X2

假设我有以下响应向量:

Suppose I have the following response vector:

set.seed(1)
Y <- runif(100, -1.0, 1.0)

以及以下预测变量矩阵:

And the following matrix of predictors:

X1 <- runif(100, 0.4, 1.0)
X2 <- sample(rep(0:1,each=50))
X <- cbind(X1, X2)

我要对系数使用以下约束:

I want to use the following constraints on the coefficients:

a + c >= 0  
c >= 0

所以对b没有任何约束.

So no constraint on b.

我知道可以使用glmc包来应用约束,但是我无法确定如何将其应用于约束.我也知道可以使用contr.sum,以便所有系数的总和为0,例如,但这不是我想要的. QP()似乎是另一种可能性,其中可以使用设置meq=0,以便所有系数都> = 0(同样,这里不是我的目标).

I know that the glmc package can be used to apply constraints, but I was not able to determine how to apply it for my constraints. I also know that contr.sum can be used so that all coefficients sum to 0, for example, but that is not what I want to do. solve.QP() seems like another possibility, where setting meq=0 can be used so that all coefficients are >=0 (again, not my goal here).

注意:解决方案必须能够处理响应向量Y中的NA值,例如:

Y <- runif(100, -1.0, 1.0)
Y[c(2,5,17,56,37,56,34,78)] <- NA

推荐答案

solve.QP可以传递任意线性约束,因此可以肯定地将其用于建模约束a+c >= 0c >= 0.

solve.QP can be passed arbitrary linear constraints, so it can certainly be used to model your constraints a+c >= 0 and c >= 0.

首先,我们可以在X上添加一列1以捕获截距项,然后可以使用solve.QP复制标准线性回归:

First, we can add a column of 1's to X to capture the intercept term, and then we can replicate standard linear regression with solve.QP:

X2 <- cbind(X, 1)
library(quadprog)
solve.QP(t(X2) %*% X2, t(Y) %*% X2, matrix(0, 3, 0), c())$solution
# [1]  0.08614041  0.21433372 -0.13267403

使用问题中的样本数据,使用标准线性回归均无法满足任何约束条件.

With the sample data from the question, neither constraint is met using standard linear regression.

通过同时修改Amatbvec参数,我们可以添加两个约束:

By modifying both the Amat and bvec parameters, we can add our two constraints:

solve.QP(t(X2) %*% X2, t(Y) %*% X2, cbind(c(1, 0, 1), c(0, 0, 1)), c(0, 0))$solution
# [1] 0.0000000 0.1422207 0.0000000

受这些限制,通过将a和c系数都设置为0来最小化平方残差.

Subject to these constraints, the squared residuals are minimized by setting the a and c coefficients to both equal 0.

通过删除有问题的观察结果,您可以像lm函数一样处理YX2中的缺失值.您可以将以下步骤作为预处理步骤:

You can handle missing values in Y or X2 just as the lm function does, by removing the offending observations. You might do something like the following as a pre-processing step:

has.missing <- rowSums(is.na(cbind(Y, X2))) > 0
Y <- Y[!has.missing]
X2 <- X2[!has.missing,]

这篇关于线性回归对系数有约束的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆