每次我使用scikit运行线性回归时都会得到不同的结果 [英] Getting different result each time I run a linear regression using scikit
问题描述
我有一个线性回归模型,我正在尝试优化.我正在优化指数移动平均值的范围以及在回归中使用的滞后变量的数量.
Hi I have a linear regression model that i am trying to optimise. I am optimising the span of an exponential moving average and the number of lagged variables that I use in the regression.
但是,我一直发现结果和计算出的mse总是得出不同的最终结果.不知道为什么有人可以帮忙吗?
However I keep finding that the results and the calculated mse keep coming up with different final results. No idea why can anyone help?
启动循环后的过程: 1.使用三个变量创建新的数据框 2.删除零值 3.为每个变量创建ewma 4.为每个变量创建滞后 5.删除NA 6.创建X,y 7.如果MSE更好,则回归并保存ema跨度和滞后数 8.从下一个值开始循环
Process after starting loop: 1. Create new dataframe with three variables 2. Remove nil values 3. Create ewma's for each variable 4. Create lags for each variable 5. Drop NA's 6. Create X,y 7. Regress and save ema span and lag number if better MSE 8. start loop with next values
我知道这可能是交叉验证的问题,但由于它可能是程序性的,所以我在这里发布了
I know that this could be a question for cross validated but since it could be a programmatic I have posted here:
bestema = 0
bestlag = 0
mse = 1000000
for e in range(2, 30):
for lags in range(1, 20):
df2 = df[['diffbn','diffbl','diffbz']]
df2 = df2[(df2 != 0).all(1)]
df2['emabn'] = pd.ewma(df2.diffbn, span=e)
df2['emabl'] = pd.ewma(df2.diffbl, span=e)
df2['emabz'] = pd.ewma(df2.diffbz, span=e)
for i in range(0,lags):
df2["lagbn%s" % str(i+1)] = df2["emabn"].shift(i+1)
df2["lagbz%s" % str(i+1)] = df2["emabz"].shift(i+1)
df2["lagbl%s" % str(i+1)] = df2["emabl"].shift(i+1)
df2 = df2.dropna()
b = list(df2)
#print(a)
b.remove('diffbl')
b.remove('emabn')
b.remove('emabz')
b.remove('emabl')
b.remove('diffbn')
b.remove('diffbz')
X = df2[b]
y = df2["diffbl"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
#print(X_train.shape)
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
if(mean_squared_error(y_test,regr.predict(X_test)) < mse):
mse = mean_squared_error(y_test,regr.predict(X_test) ** 2)
#mse = mean_squared_error(y_test,regr.predict(X_test))
bestema = e
bestlag = lags
print(regr.coef_)
print(bestema)
print(bestlag)
print(mse)
推荐答案
sklearn的train_test_split
函数(请参阅文档:
The train_test_split
function from sklearn (see docs: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html) is random, so it is logical you get different results each time.
You can pass an argument to the random_state
keyword to have it the same each time.
这篇关于每次我使用scikit运行线性回归时都会得到不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!