LSTM通过将分类数据作为输入来预测数字数据 [英] LSTM to forecast numerical data by having categorical data as input
问题描述
我有一个类似的 DataFrame
:
df = pd.DataFrame([
{'date':'2021-01-15', 'value':145, 'label':'negative'},
{'date':'2021-01-16', 'value':144, 'label':'positive'},
{'date':'2021-01-17', 'value':147, 'label':'positive'},
{'date':'2021-01-18', 'value':146, 'label':'negative'},
{'date':'2021-01-19', 'value':155, 'label':'negative'},
{'date':'2021-01-20', 'value':157, 'label':'positive'},
{'date':'2021-01-21', 'value':158, 'label':'positive'},
{'date':'2021-01-22', 'value':157, 'label':'negative'},
{'date':'2021-01-23', 'value':157, 'label':'positive'},
{'date':'2021-01-24', 'value':152, 'label':'positive'},
{'date':'2021-01-25', 'value':159, 'label':'negative'},
{'date':'2021-01-26', 'value':162, 'label':'positive'},
{'date':'2021-01-27', 'value':160, 'label':'positive'},
{'date':'2021-01-28', 'value':153, 'label':'negative'},
{'date':'2021-01-29', 'value':149, 'label':'negative'},
{'date':'2021-01-30', 'value':156, 'label':'positive'},
{'date':'2021-01-31', 'value':168, 'label':'positive'},
{'date':'2021-02-01', 'value':179, 'label':'negative'},
{'date':'2021-02-02', 'value':184, 'label':'positive'},
{'date':'2021-02-03', 'value':189, 'label':'positive'},
{'date':'2021-02-04', 'value':196, 'label':'positive'}])
我已经将 date
列字符串转换为 datetime
格式,并使用 set_index
方法将其设置为索引.
I have already converted date
column strings into datetime
format and set it as index with set_index
method.
一旦 n
和 m
被修复,我想使用递归神经网络(LSTM)来预测最后的 n
个值.通过仅考虑 label
列的类别来生成 value
列.
Once n
and m
are fixed, I would like to use a Recurrent Neural Network (LSTM) to predict last n
values of the value
column by taking into account the categories of the label
column only.
我刚刚使用以下代码对 label
列功能进行了编码:
I have just encoded label
column features with the following:
from sklearn.preprocessing import OneHotEncoder
hot = OneHotEncoder(sparse = False).fit_transform(df.label.to_numpy().reshape(-1, 1))
和按比例缩放的数据:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range = (0, 1))
scaled = scaler.fit_transform(df.value.values)
但是我无法成功考虑 m
和 n
条件来构建训练和测试集.
but I cannot succeed in taking into account m
and n
conditions to build train and test set.
有什么建议吗?
推荐答案
首先,您必须将数据集转换为LSTM支持的时间序列形式.建立一个仅预测第二天的模型,并将测试过程作为您希望从单个预测中得出的预测数量进行滚动.
您可以从此处
First of all, you have to transform the dataset into a time-series form that supported by LSTM. build a model to predict the next day only and roll the testing process as the number of predictions you want from a single prediction.
you can get complete from here
这篇关于LSTM通过将分类数据作为输入来预测数字数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!