线性回归对系数有约束 [英] Linear regression with constraints on the coefficients
问题描述
我正在尝试对像这样的模型进行线性回归:
I am trying to perform linear regression, for a model like this:
Y = aX1 + bX2 + c
所以Y ~ X1 + X2
假设我有以下响应向量:
Suppose I have the following response vector:
set.seed(1)
Y <- runif(100, -1.0, 1.0)
以及以下预测变量矩阵:
And the following matrix of predictors:
X1 <- runif(100, 0.4, 1.0)
X2 <- sample(rep(0:1,each=50))
X <- cbind(X1, X2)
我要对系数使用以下约束:
I want to use the following constraints on the coefficients:
a + c >= 0
c >= 0
所以对b没有任何约束.
So no constraint on b.
我知道可以使用glmc包来应用约束,但是我无法确定如何将其应用于约束.我也知道可以使用contr.sum,以便所有系数的总和为0,例如,但这不是我想要的. QP()似乎是另一种可能性,其中可以使用设置meq=0
,以便所有系数都> = 0(同样,这里不是我的目标).
I know that the glmc package can be used to apply constraints, but I was not able to determine how to apply it for my constraints. I also know that contr.sum can be used so that all coefficients sum to 0, for example, but that is not what I want to do. solve.QP() seems like another possibility, where setting meq=0
can be used so that all coefficients are >=0 (again, not my goal here).
注意:解决方案必须能够处理响应向量Y中的NA值,例如:
Y <- runif(100, -1.0, 1.0)
Y[c(2,5,17,56,37,56,34,78)] <- NA
推荐答案
solve.QP
可以传递任意线性约束,因此可以肯定地将其用于建模约束a+c >= 0
和c >= 0
.
solve.QP
can be passed arbitrary linear constraints, so it can certainly be used to model your constraints a+c >= 0
and c >= 0
.
首先,我们可以在X
上添加一列1以捕获截距项,然后可以使用solve.QP
复制标准线性回归:
First, we can add a column of 1's to X
to capture the intercept term, and then we can replicate standard linear regression with solve.QP
:
X2 <- cbind(X, 1)
library(quadprog)
solve.QP(t(X2) %*% X2, t(Y) %*% X2, matrix(0, 3, 0), c())$solution
# [1] 0.08614041 0.21433372 -0.13267403
使用问题中的样本数据,使用标准线性回归均无法满足任何约束条件.
With the sample data from the question, neither constraint is met using standard linear regression.
通过同时修改Amat
和bvec
参数,我们可以添加两个约束:
By modifying both the Amat
and bvec
parameters, we can add our two constraints:
solve.QP(t(X2) %*% X2, t(Y) %*% X2, cbind(c(1, 0, 1), c(0, 0, 1)), c(0, 0))$solution
# [1] 0.0000000 0.1422207 0.0000000
受这些限制,通过将a和c系数都设置为0来最小化平方残差.
Subject to these constraints, the squared residuals are minimized by setting the a and c coefficients to both equal 0.
通过删除有问题的观察结果,您可以像lm
函数一样处理Y
或X2
中的缺失值.您可以将以下步骤作为预处理步骤:
You can handle missing values in Y
or X2
just as the lm
function does, by removing the offending observations. You might do something like the following as a pre-processing step:
has.missing <- rowSums(is.na(cbind(Y, X2))) > 0
Y <- Y[!has.missing]
X2 <- X2[!has.missing,]
这篇关于线性回归对系数有约束的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!