如何配置套索回归以不惩罚某些变量? [英] How to configure lasso regression to not penalize certain variables?

查看:34
本文介绍了如何配置套索回归以不惩罚某些变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 python 中使用套索回归.我目前正在 scikit-learn 库中使用套索功能.

I'm trying to use lasso regression in python. I'm currently using lasso function in scikit-learn library.

我希望我的模型在训练时不惩罚某些变量.(仅惩罚其余变量)

I want my model not to penalize certain variables while training. (penalize only the rest of variables)

以下是我当前的训练代码

Below is my current code for training

rg_mdt = linear_model.LassoCV(alphas=np.array(10**np.linspace(0, -4, 100)), fit_intercept=True, normalize=True, cv=10)
rg_mdt.fit(df_mdt_rgmt.loc[df_mdt_rgmt.CLUSTER_ID == k].drop(['RESPONSE', 'CLUSTER_ID'], axis=1), df_mdt_rgmt.loc[df_mdt_rgmt.CLUSTER_ID == k, 'RESPONSE'])

df_mdt_rgmt 是数据集市,我试图将某些列的系数保持为非零.

df_mdt_rgmt is the data mart and I'm trying to keep the coefficient for certain columns non-zero.

R 中的 glmnet 提供了让我这样做的惩罚因子"参数,但我如何在 python scikit-learn 中做到这一点?

glmnet in R provides 'penalty factor' parameter that let me do this, but how can I do that in python scikit-learn?

下面是我在 R 中的代码

Below is the code I have in R

get.Lassomodel <- function(TB.EXP, TB.RSP){
  VT.PEN <- rep(1, ncol(TB.EXP))
  VT.PEN[which(colnames(TB.EXP) == "DC_RATE")] <- 0
  VT.PEN[which(colnames(TB.EXP) == "FR_PRICE_PW_REP")] <- 0

  VT.GRID <- 10^seq(0, -4, length=100)

  REG.MOD <- cv.glmnet(as.matrix(TB.EXP), as.matrix(TB.RSP), alpha=1, 
  lambda=VT.GRID, penalty.factor=VT.PEN, nfolds=10, intercept=TRUE)

  return(REG.MOD)
}

推荐答案

恐怕你不能.当然,这不是理论问题,而只是设计决策.

I'm afraid you can't. Of course it's not an theoretical issue, but just a design-decision.

我的推理基于可用的API,虽然有时有未记录的函数,但这次我认为没有你需要的,因为 user-guide 已经以 1-factor-norm-of-all 形式发布了这个问题 alpha*||w||_1

My reasoning is based on the available API and while sometimes there are undocumented functions, this time i don't think there is what you need because the user-guide already posts this problem in the 1-factor-norm-of-all form alpha*||w||_1

根据您的设置,您可能会修改 sklearn 的代码(有点害怕 CD 调整),甚至使用 scipy.optimize 实现自定义目标(尽管后者可能会慢一点).

Depending on your setting you might modify sklearn's code (a bit scared about CD-tunings) or even implement a customized-objective using scipy.optimize (although the latter might be a bit slower).

这是一些展示 scipy.optimize 方法的示例.我通过删除拦截来简化问题.

Here is some example showing the scipy.optimize approach. I simplified the problem by removing intercept's.

""" data """
import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
A = diabetes.data[:150]
y = diabetes.target[:150]
alpha=0.1
weights=np.ones(A.shape[1])

""" sklearn """
from sklearn import linear_model
clf = linear_model.Lasso(alpha=alpha, fit_intercept=False)
clf.fit(A, y)

""" scipy """
from scipy.optimize import minimize
def lasso(x):  # following sklearn's definition from user-guide!
    return (1. / (2*A.shape[0])) * np.square(np.linalg.norm(A.dot(x) - y, 2)) + alpha * np.linalg.norm(weights*x, 1)

""" Test with weights = 1 """
x0 = np.zeros(A.shape[1])
res = minimize(lasso, x0, method='L-BFGS-B', options={'disp': False})
print('Equal weights')
print(lasso(clf.coef_), clf.coef_[:5])
print(lasso(res.x), res.x[:5])

""" Test scipy-based with special weights """
weights[[0, 3, 5]] = 0.0
res = minimize(lasso, x0, method='L-BFGS-B', options={'disp': False})
print('Specific weights')
print(lasso(res.x), res.x[:5])

输出:

Equal weights
12467.4614224 [-524.03922009  -75.41111354  820.0330707    40.08184085 -307.86020107]
12467.6514697 [-526.7102518   -67.42487561  825.70158417   40.04699607 -271.02909258]
Specific weights
12362.6078842 [ -6.12843589e+02  -1.51628334e+01   8.47561732e+02   9.54387812e+01
  -1.02957112e-05]

这篇关于如何配置套索回归以不惩罚某些变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆