如何处理预测值的变化 [英] How to handle Shift in Forecasted value

查看:132
本文介绍了如何处理预测值的变化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Keras中使用LSTM实现了预测模型.数据集是分开的15分钟,我预测未来有12个步骤.

I implemented a forecasting model using LSTM in Keras. The dataset is 15mints seperated and I am forecasting for 12 future steps.

该模型对于该问题表现良好.但是所做的预测存在一个小问题.它显示出很小的移位效果.要获得更清晰的图片,请参见下面的附图.

The model performs good for the problem. But there is a small problem with the forecast made. It is showing a small shift effect. To get a more clear picture see the below attached figure.

如何处理此问题?必须如何转换数据才能处理此类问题??

How to handle this problem.? How the data must be transformed to handle this kind of issue.?

我使用的模型如下

init_lstm = RandomUniform(minval=-.05, maxval=.05)
init_dense_1 = RandomUniform(minval=-.03, maxval=.06)

model = Sequential()

model.add(LSTM(15, input_shape=(X.shape[1], X.shape[2]), kernel_initializer=init_lstm, recurrent_dropout=0.33))

model.add(Dense(1, kernel_initializer=init_dense_1, activation='linear'))

model.compile(loss='mae', optimizer=Adam(lr=1e-4))

history = model.fit(X, y, epochs=1000, batch_size=16, validation_data=(X_valid, y_valid), verbose=1, shuffle=False)

我是这样预测的

my_forecasts = model.predict(X_valid, batch_size=16)

使用此函数将时间序列数据转换为监督数据以供LSTM使用

Time series data is transformed to supervised to feed the LSTM using this function

# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg

super_data = series_to_supervised(data, 12, 1)

我的时间序列是一个多元变量. var2是我需要预测的.我像

My timeseries is a multi-variate one. var2 is the one that I need to forecast. I dropped the future var1 like

del super_data['var1(t)']

分隔开的火车,有效期是这样

Seperated train and valid like this

features = super_data[feat_names]
values = super_data[val_name]

ntest = 3444

train_feats, test_feats = features[0:-n_test], features[-n_test:]
train_vals, test_vals = values [0:-n_test], values [-n_test:]

X, y = train_feats.values, train_vals.values
X = X.reshape(X.shape[0], 1, X.shape[1])

X_valid, y_valid = test_feats .values, test_vals .values
X_valid = X_valid.reshape(X_valid.shape[0], 1, X_valid.shape[1])

对于该预测,我尚未使数据保持不变.我还尝试采取不同的方法并使模型尽可能固定,但问题仍然相同.

I haven't made the data stationary for this forecast. I also tried taking difference and making the model as stationary as I can, but the issue remains the same.

我还尝试了最小-最大缩放器的不同缩放范围,希望它可以对模型有所帮助.但是预测越来越差了.

I have also tried different scaling ranges for the min-max scaler, hoping it may help the model. But the forecasts are getting worsened.

Other Things I have tried

=> Tried other optimizers
=> Tried mse loss and custom log-mae loss functions
=> Tried varying batch_size
=> Tried adding more past timesteps
=> Tried training with sliding window and TimeSeriesSplit

我了解到该模型正在向其复制最后一个已知值,从而最大限度地降低了损失

I understand that the model is replicating the last known value to it, thereby minimizing the loss as good as it can

在整个培训过程中,验证和培训损失仍然很低.这使我开始思考是否为此需要提出一个新的损失函数.

The validation and training loss remains low enough through out the training process. This makes me think whether I need to come up with a new loss function for this purpose.

那是必要的吗?如果是这样,我应该选择什么损失函数??

Is that necessary.? If so what loss function should I go for.?

我尝试了偶然发现的所有方法.我根本找不到任何指向此类问题的资源.这是数据问题吗?这是因为LSTM很难解决这个问题.

I have tried all the methods that I stumbled upon. I can't find any resource at all that points to this kind of issue. Is this the problem of data.? Is this because the problem is very hard to be learned by a LSTM .?

推荐答案

您在以下位置寻求我的帮助:

you asked for my help at:

希望不晚.您可以尝试的是您可以转移特征的数字显示方式.让我解释一下:

Hope not late. What you can try is that you can divert the numerical explicitness of your features. Let me explain:

类似于上一个主题中的答案;回归算法将使用您提供的时间窗口中的值作为样本,以最大程度地减少误差.假设您正在尝试预测时间t的BTC收盘价.您的功能之一包括以前的收盘价,您要给出一个从t-20到t-1的最后20个输入的时间序列窗口. 回归者可能会学会在时间步t-1或t-2处选择关闭值,或者在这种情况下选择关闭值,即作弊.这样想:如果在t-1时收盘价为$ 6340,那么预测$ 6340或在t + 1时收盘价将最大程度地减小误差.但是实际上该算法没有学习任何模式.它只是复制,因此除了完成优化任务外,它基本上什么也不做.

Similar to my answer in the previous topic; the regression algorithm will use the value from the time-window you give as a sample, to minimize the error. Let's assume you are trying to predict the closing price of BTC at time t. One of your features consists of previous closing prices and you are giving a time-series window of last 20 inputs from t-20 to t-1. A regressor probably will learn to choose the closing value at time step t-1 or t-2 or a close value in this case, cheating. Think like that: if closing price was $6340 at t-1, predicting $6340 or something close at t+1 would minimize the error at strongest. But actually the algorithm did not learn any patterns; it just replicates, so it basically does nothing but accomplishing its optimization duty.

从我的示例中类似地思考:通过改变明确性,我的意思是:不直接给出收盘价,而是按比例缩放价格或根本不使用明确的价格.请勿使用任何明确向算法显示收盘价的功能,也不要在每个时间段使用开盘价,最高价,最低价等.您将需要在这里发挥创造力,设计功能以摆脱明显的功能;您可以给出平方的紧密差异(回归者仍然可以从线性经验和经验中窃取过去),其与交易量的比率.或者,可以通过使用某种有意义的方式将其数字化来对这些特征进行分类. 关键是不能直接凭直觉就可以预测到什么,而只能为算法提供模式.

Think analogously from my example: By diverting the explicitness, what I mean is that: do not give the closing prices directly, but scale them or do not use explicit ones at all. Do not use any features explicitly showing the closing prices to the algorithm, do not use open, high, low etc for every time step. You will need to be creative here, engineer the features to get rid of explicit ones; you can give squared close differences (regressor can still steal from past with linear differences, with experience), its ratio to volume. Or, can make the features categorical by digitizing them in a manner that would make sense to use. The point is do not give direct intuition to what it should predict, only provide patterns for algorithm to work on.

根据您的任务,可能会建议一种更快的方法.如果要预测标签足以满足您的更改百分比,则可以进行多类别分类,只需注意类别不平衡的情况即可.如果仅上下波动就足够了,您可以直接进行二进制分类. 如果您没有将数据从训练中泄漏到测试集中,则只能在回归任务上看到复制或移动问题.如果可能的话,摆脱时间序列窗口应用程序的回归.

A faster approach may be suggested depending on your task. You can do multi-class classification if predicting how much percent of change that your labels is enough for you, just be careful about class imbalance situations. If even just the up/down fluctuations are enough for you, you can directly go for the binary classification. Replication or shifting problems are only seen at the regression tasks, if you are not leaking data from training to the test set. If possible, get rid out of regression for time-series windowed applications.

如果有任何误解或遗漏,我会在附近.希望我能帮上忙.祝你好运.

If anything misunderstood or missing, I will be around. Hope I could help. Good Luck.

这篇关于如何处理预测值的变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆