如何使用Lasso和RobustScalar构建预测函数? [英] How to build a predict function with Lasso and RobustScalar?

查看:182
本文介绍了如何使用Lasso和RobustScalar构建预测函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚如何使用LASSO回归来预测值,而无需使用Sklearn提供的.predict函数.这基本上只是为了拓宽我对LASSO内部工作方式的理解.我在 Cross Validated 上询问了有关LASSO的问题回归有效,其中一项评论提到了预测函数的工作方式与线性回归相同.因此,我想尝试做自己的功能来做到这一点.

I'm trying to figure out how to predict values with LASSO regression without using the .predict function that Sklearn provides. This is basically just to broaden my understanding of how LASSO works internally. I asked a question on Cross Validated about how LASSO regression works, and one of the comments mentioned how the predict function works the same as in Linear Regression. Because of this, I wanted to try and make my own function to do this.

我能够在更简单的示例中成功地重新创建预测函数,但是当我尝试将其与RobustScaler结合使用时,我会不断获得不同的输出.在此示例中,我使用Sklearn获得的预测为4.33,使用自己的函数获得的预测为6.18.我在这里想念什么?最后,我是否不能正确地对预测进行逆变换?

I was able to successfully recreate the predict function in simpler examples, but when I try to use it in conjunction with RobustScaler, I keep getting different outputs. With this example, I'm getting the prediction as 4.33 with Sklearn, and 6.18 with my own function. What am I missing here? Am I not inverse transforming the prediction correctly at the end?

import pandas as pd
from sklearn.preprocessing import RobustScaler
from sklearn.linear_model import Lasso
import numpy as np

df = pd.DataFrame({'Y':[5, -10, 10, .5, 2.5, 15], 'X1':[1., -2.,  2., .1, .5, 3], 'X2':[1, 1, 2, 1, 1, 1], 
              'X3':[6, 6, 6, 5, 6, 4], 'X4':[6, 5, 4, 3, 2, 1]})

X = df[['X1','X2','X3','X4']]
y = df[['Y']]

#Scaling 
transformer_x = RobustScaler().fit(X)
transformer_y = RobustScaler().fit(y) 
X_scal = transformer_x.transform(X)
y_scal = transformer_y.transform(y)

#LASSO
lasso = Lasso()
lasso = lasso.fit(X_scal, y_scal)

#LASSO info
print('Score: ', lasso.score(X_scal,y_scal))
print('Raw Intercept: ', lasso.intercept_.round(2)[0]) 
intercept = transformer_y.inverse_transform([lasso.intercept_])[0][0]
print('Unscaled Intercept: ', intercept) 
print('\nCoefficients Used: ')
coeff_array = lasso.coef_
inverse_coeff_array = transformer_x.inverse_transform(lasso.coef_.reshape(1,-1))[0]
for i,j,k in zip(X.columns, coeff_array, inverse_coeff_array):
    if j != 0:
        print(i, j.round(2), k.round(2))

#Predictions
example = [[3,1,1,1]]
pred = lasso.predict(example)
pred_scal = transformer_y.inverse_transform(pred.reshape(-1, 1))
print('\nRaw Prediction where X1 = 3: ', pred[0])
print('Unscaled Prediction where X1 = 3: ', pred_scal[0][0])

#Predictions without using the .predict function 
def lasso_predict_value_(X1,X2,X3,X4): 
    print('intercept: ', intercept)
    print('coef: ', inverse_coeff_array[0])
    print('X1: ', X1)
    preds = intercept + inverse_coeff_array[0]*X1
    print('Your predicted value is: ', preds)

lasso_predict_value_(3,1,1,1)

推荐答案

受过训练的Lasso不知道是否调用了给定的数据点.因此,您进行预测的手动方法不应采用标度的方面.

The trained Lasso does not have any information whether the given datapoint is scalled or not. Hence your manual method to do the predict should not take the scalling aspect of it.

如果删除您对模型系数的处理,我们可以得到sklearn模型的结果

If I remove your processing on the model co-efficients, we can get the result of sklearn model


example = [[3,1,1,1]]
lasso.predict(example)

# array([0.07533937])


#Predictions without using the .predict function 
def lasso_predict_value_(X1,X2,X3,X4): 
    x_test = np.array([X1,X2, X3, X4])
    preds = lasso.intercept_ + sum(x_test*lasso.coef_)
    print('Your predicted value is: ', preds)


lasso_predict_value_(3,1,1,1)

# Your predicted value is:  [0.07533937]

更新2:

一旦我使用LASSO,我就需要查看他们的预测中的内容 原始单位.我的因变量是美元金额,如果我 不要逆变换回去,我看不到我有多少美元 需要预测.

Once I use LASSO, I then need to see what my predictions were in their original units. My dependent variable is in dollar amounts, and if I don't inverse transform it back, I'm unable to see how many dollars I need for the prediction.

这是一个非常有效的方案.您需要应用transformer_y.inverse_transform才能获得未调用的美元金额值.无需干扰模型权重.

This is a very valid scenario. You need to apply the transformer_y.inverse_transform to get your unscalled dollar amount value. There is no need for disturbing the model weights.

更新示例

example = [[3,1,1,1]]
scaled_pred = lasso.predict(transformer_x.transform(example))
transformer_y.inverse_transform([scaled_pred])
# array([[4.07460407]])

#Predictions without using the .predict function 
def lasso_predict_value_(X1,X2,X3,X4): 
    x_test = transformer_x.transform(np.array([X1,X2, X3, X4]).reshape(1,-1))[0]
    preds = lasso.intercept_ + sum(x_test*lasso.coef_)
    print('Your predicted value is: ', preds)
    print('Your unscaled predicted value is: ', 
          transformer_y.inverse_transform([scaled_pred]))


lasso_predict_value_(3,1,1,1)
# Your predicted value is:  [0.0418844]    
# Your unscaled predicted value is:  [[4.07460407]]

这篇关于如何使用Lasso和RobustScalar构建预测函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆