如何将分类时间序列数据输入LSTM [英] How to input a classification time series data into LSTM
问题描述
我想将我的数据输入LSTM网络,但是找不到任何类似的问题或教程.我的数据集是这样的:
I want to feed my data into a LSTM network, but can't find any similar question or tutorial. My dataset is something like:
person 1:
t1 f1 f2 f3
t2 f1 f2 f3
...
tn f1 f2 f3
.
.
.
person K:
t1 f1 f2 f3
t2 f1 f2 f3
...
tn f1 f2 f3
所以我有一个k
人,对于每个人我都有一个类似输入的矩阵.每行的第一列是增量时间戳(类似于时间线,因此t1
< t2
),其他列是该时间中人员的特征.
So i have k
person and for each person i have a matrix like input. The first column of each row is incremental time stamp (like a time-line, so t1
< t2
) and other columns are features of person in that time.
在数学方面:我有一个(number of example,number of time stamp, number of feature)
矩阵,如(52,20,4),其中52是人数,20是一个人的时间戳数,4是特征数(1列是时间戳)和3是功能)
In mathematical aspect: i have a (number of example,number of time stamp, number of feature)
matrix like (52,20,4) which 52 is number of persons, 20 is number of time stamps for a person and 4 is number of features( 1 column is time stamp and 3 are features)
每个人都有一个班级名称.我想使用LSTM神经网络将此人分为两类.我的问题是如何在Keras等高级库中将此类数据输入LSTM?
Each person has a class name. I want to classify this persons into two class using LSTM neural network. My question is how to input this type of data into LSTM in a high level library such as Keras?
修改:
我的第一次尝试是在keras中将其用作input_shape
,但是我在二进制分类中获得了50%的精度!是我的数据集中的问题还是input_shape
是错误的?!
My first attempt is to use this as input_shape
in keras, but i get 50% accuracy in binary classification! Is the problem in my dataset or input_shape
is wrong?!
LSTM(5,input_shape=(20,4))
推荐答案
您需要用feature vector
表示每个人的数据,并将此向量传递到分类器中(例如
You need to represent each person's data with a feature vector
and pass this vector into the classifier (e.g. MLP classifier). I guess your question might be how to get the feature vector
out of raw data? There are many ways to get feature out of time-series data. In your case, LSTM would be an option.
LSTM 需要一个3d矢量,其形状为[batch_size x time x feature]
.正如您在问题中提到的,您可以使用以下方法将数据输入模型:
LSTM needs a 3D vector for its input with the shape of[batch_size x time x feature]
. As you mentioned in the question, you can feed data into the model with:
model = Sequential()
model.add(LSTM(5, input_shape=(20, 4))
model.add(Dense(2, activation='sigmoid')
1)我猜t
和f
值相差很大,并且没有归一化.结果,对LSTM的预测并不令人印象深刻.
1) I guess t
and f
values vary widely and are not normalized. As a result, the prediction of LSTM is not impressive.
2)您的数据集相对较小.为了找出问题所在,请在训练数据的一小部分上对模型进行过度拟合.如果您在训练数据上获得100%的准确性,则意味着您的LSTM学会了很好地表示特征向量.否则,这意味着您设计的模型不正确或数据输入不正确.
2) Your dataset is relatively small. To find out the issue, overfit the model on a small subset of training data. If you get the accuracy of 100% on training data then it means your LSTM learned to represent feature vectors very well. Otherwise, it implies you do not design a good model or feed data properly.
这篇关于如何将分类时间序列数据输入LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!