每次我使用scikit运行线性回归时都会得到不同的结果 [英] Getting different result each time I run a linear regression using scikit

查看:427
本文介绍了每次我使用scikit运行线性回归时都会得到不同的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个线性回归模型,我正在尝试优化.我正在优化指数移动平均值的范围以及在回归中使用的滞后变量的数量.

Hi I have a linear regression model that i am trying to optimise. I am optimising the span of an exponential moving average and the number of lagged variables that I use in the regression.

但是,我一直发现结果和计算出的mse总是得出不同的最终结果.不知道为什么有人可以帮忙吗?

However I keep finding that the results and the calculated mse keep coming up with different final results. No idea why can anyone help?

启动循环后的过程: 1.使用三个变量创建新的数据框 2.删除零值 3.为每个变量创建ewma 4.为每个变量创建滞后 5.删除NA 6.创建X,y 7.如果MSE更好,则回归并保存ema跨度和滞后数 8.从下一个值开始循环

Process after starting loop: 1. Create new dataframe with three variables 2. Remove nil values 3. Create ewma's for each variable 4. Create lags for each variable 5. Drop NA's 6. Create X,y 7. Regress and save ema span and lag number if better MSE 8. start loop with next values

我知道这可能是交叉验证的问题,但由于它可能是程序性的,所以我在这里发布了

I know that this could be a question for cross validated but since it could be a programmatic I have posted here:

bestema = 0
bestlag = 0
mse = 1000000

for e in range(2, 30):
    for lags in range(1, 20):
        df2 = df[['diffbn','diffbl','diffbz']]
        df2 = df2[(df2 != 0).all(1)]        
        df2['emabn'] = pd.ewma(df2.diffbn, span=e)
        df2['emabl'] = pd.ewma(df2.diffbl, span=e)
        df2['emabz'] = pd.ewma(df2.diffbz, span=e)
        for i in range(0,lags):
            df2["lagbn%s" % str(i+1)] = df2["emabn"].shift(i+1)
            df2["lagbz%s" % str(i+1)] = df2["emabz"].shift(i+1)
            df2["lagbl%s" % str(i+1)] = df2["emabl"].shift(i+1)
        df2 = df2.dropna()
        b = list(df2)
            #print(a)
        b.remove('diffbl')
        b.remove('emabn')
        b.remove('emabz')
        b.remove('emabl')
        b.remove('diffbn')
        b.remove('diffbz')
        X = df2[b]
        y = df2["diffbl"]
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
        #print(X_train.shape)
        regr = linear_model.LinearRegression()
        regr.fit(X_train, y_train)
        if(mean_squared_error(y_test,regr.predict(X_test)) < mse):
            mse = mean_squared_error(y_test,regr.predict(X_test) ** 2)
            #mse = mean_squared_error(y_test,regr.predict(X_test))
            bestema = e
            bestlag = lags
            print(regr.coef_)
            print(bestema)
            print(bestlag)
            print(mse)

推荐答案

sklearn的train_test_split函数(请参阅文档:

The train_test_split function from sklearn (see docs: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html) is random, so it is logical you get different results each time.
You can pass an argument to the random_state keyword to have it the same each time.

这篇关于每次我使用scikit运行线性回归时都会得到不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆