每次我使用scikit运行线性回归时都会得到不同的结果 [英] Getting different result each time I run a linear regression using scikit

查看：427 发布时间：2020/4/30 12:31:16 python pandas scikit-learn linear-regression

本文介绍了每次我使用scikit运行线性回归时都会得到不同的结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个线性回归模型，我正在尝试优化.我正在优化指数移动平均值的范围以及在回归中使用的滞后变量的数量.

Hi I have a linear regression model that i am trying to optimise. I am optimising the span of an exponential moving average and the number of lagged variables that I use in the regression.

但是，我一直发现结果和计算出的mse总是得出不同的最终结果.不知道为什么有人可以帮忙吗?

However I keep finding that the results and the calculated mse keep coming up with different final results. No idea why can anyone help?

启动循环后的过程: 1.使用三个变量创建新的数据框 2.删除零值 3.为每个变量创建ewma 4.为每个变量创建滞后 5.删除NA 6.创建X，y 7.如果MSE更好，则回归并保存ema跨度和滞后数 8.从下一个值开始循环

Process after starting loop: 1. Create new dataframe with three variables 2. Remove nil values 3. Create ewma's for each variable 4. Create lags for each variable 5. Drop NA's 6. Create X,y 7. Regress and save ema span and lag number if better MSE 8. start loop with next values

我知道这可能是交叉验证的问题，但由于它可能是程序性的，所以我在这里发布了

I know that this could be a question for cross validated but since it could be a programmatic I have posted here:

bestema = 0
bestlag = 0
mse = 1000000

for e in range(2, 30):
    for lags in range(1, 20):
        df2 = df[['diffbn','diffbl','diffbz']]
        df2 = df2[(df2 != 0).all(1)]        
        df2['emabn'] = pd.ewma(df2.diffbn, span=e)
        df2['emabl'] = pd.ewma(df2.diffbl, span=e)
        df2['emabz'] = pd.ewma(df2.diffbz, span=e)
        for i in range(0,lags):
            df2["lagbn%s" % str(i+1)] = df2["emabn"].shift(i+1)
            df2["lagbz%s" % str(i+1)] = df2["emabz"].shift(i+1)
            df2["lagbl%s" % str(i+1)] = df2["emabl"].shift(i+1)
        df2 = df2.dropna()
        b = list(df2)
            #print(a)
        b.remove('diffbl')
        b.remove('emabn')
        b.remove('emabz')
        b.remove('emabl')
        b.remove('diffbn')
        b.remove('diffbz')
        X = df2[b]
        y = df2["diffbl"]
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
        #print(X_train.shape)
        regr = linear_model.LinearRegression()
        regr.fit(X_train, y_train)
        if(mean_squared_error(y_test,regr.predict(X_test)) < mse):
            mse = mean_squared_error(y_test,regr.predict(X_test) ** 2)
            #mse = mean_squared_error(y_test,regr.predict(X_test))
            bestema = e
            bestlag = lags
            print(regr.coef_)
            print(bestema)
            print(bestlag)
            print(mse)

每次我使用scikit运行线性回归时都会得到不同的结果 [英] Getting different result each time I run a linear regression using scikit

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

每次我使用scikit运行线性回归时都会得到不同的结果 [英] Getting different result each time I run a linear regression using scikit

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭