如何构造LSTM神经网络进行分类 [英] How to structure an LSTM neural network for classification

查看:201
本文介绍了如何构造LSTM神经网络进行分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据涉及两个人之间的各种对话.每个句子都有某种类型的分类.我正在尝试使用NLP网络对对话的每个句子进行分类.我尝试了一个卷积网,并得到了不错的结果(不是破土动工).我认为,由于这是一次来回的对话,因此LSTM网络可能会产生更好的结果,因为先前所说的内容可能会对随后的内容产生很大的影响.

I have data that has various conversations between two people. Each sentence has some type of classification. I am attempting to use an NLP net to classify each sentence of the conversation. I tried a convolution net and get decent results (not ground breaking tho). I figured that since this a back and forth conversation, and LSTM net may produce better results, because what was previously said may have a large impact on what follows.

如果遵循上述结构,我会假设我正在进行多对多操作.我的数据看起来像这样.

If I follow the structure above, I would assume that I am doing a many-to-many. My data looks like.

X_train = [[sentence 1],  
           [sentence 2],
           [sentence 3]]
Y_train = [[0],
           [1],
           [0]]

已使用word2vec处理数据.然后,我按如下方式设计网络.

Data has been processed using word2vec. I then design my network as follows..

model = Sequential()      
model.add(Embedding(len(vocabulary),embedding_dim,
          input_length=X_train.shape[1]))
model.add(LSTM(88))
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer='rmsprop',loss='binary_crossentropy',
              metrics['accuracy'])
model.fit(X_train,Y_train,verbose=2,nb_epoch=3,batch_size=15)

我假设此设置将一次输入一批句子.但是,如果在model.fit中,混洗不等于对其接收的混洗批次进行伪造,那么为什么LSTM网络在这种情况下甚至仍然有用?从该主题的研究来看,要实现多对多结构,也需要更改LSTM层.

I assume that this setup will feed one batch of sentences in at a time. However, if in model.fit, shuffle is not equal to false its receiving shuffled batches, so why is an LSTM net even useful in this case? From research on the subject, to achieve a many-to-many structure one would need to change the LSTM layer too

model.add(LSTM(88,return_sequence=True))

并且输出层将需要是...

and the output layer would need to be...

model.add(TimeDistributed(Dense(1,activation='sigmoid')))

切换到此结构时,输入大小出现错误.我不确定如何重新格式化数据以满足这一要求,也不确定如何编辑嵌入层以接收新的数据格式.

When switching to this structure I get an error on the input size. I'm unsure of how to reformat the data to meet this requirement, and also how to edit the embedding layer to receive the new data format.

任何输入将不胜感激.或者,如果您对更好的方法有任何建议,我很高兴听到他们的建议!

Any input would be greatly appreciated. Or if you have any suggestions on a better method, I am more than happy to hear them!

推荐答案

您的第一次尝试很好.改组发生在句子之间,唯一的改组是它们之间的训练样本,这样它们就不会总是以相同的顺序出现.句子中的单词不会被打乱.

Your first attempt was good. The shuffling takes place between sentences, the only shuffle the training samples between them so that they don't always come in in the same order. The words inside sentences are not shuffled.

或者我可能无法正确理解问题?

Or maybe I didn't understand the question correctly?

编辑:

对这个问题有了更好的理解之后,这就是我的主张.

After a better understanding of the question, here is my proposition.

数据准备: 您将语料库切成n个句子的块(它们可以重叠). 然后,您应该具有类似于(number_blocks_of_sentences, n, number_of_words_per_sentence)的形状,因此基本上是包含n语句块的2D数组的列表. n不应太大,因为LSTM不能在训练时处理序列中的大量元素(消失的梯度). 您的目标应该是形状为(number_blocks_of_sentences, n, 1)的数组,因此还应该是包含您的句子块中每个句子的类的2D数组的列表.

Data preparation : You slice your corpus in blocks of n sentences (they can overlap). You should then have a shape like (number_blocks_of_sentences, n, number_of_words_per_sentence) so basically a list of 2D arrays which contain blocks of n sentences. n shouldn't be too big because LSTM can't handle huge number of elements in the sequence when training (vanishing gradient). Your targets should be an array of shape (number_blocks_of_sentences, n, 1) so also a list of 2D arrays containing the class of each sentence in your block of sentences.

型号:

Model :

n_sentences = X_train.shape[1]  # number of sentences in a sample (n)
n_words = X_train.shape[2]      # number of words in a sentence

model = Sequential()
# Reshape the input because Embedding only accepts shape (batch_size, input_length) so we just transform list of sentences in huge list of words
model.add(Reshape((n_sentences * n_words,),input_shape = (n_sentences, n_words)))
# Embedding layer - output shape will be (batch_size, n_sentences * n_words, embedding_dim) so each sample in the batch is a big 2D array of words embedded 
model.add(Embedding(len(vocabaulary), embedding_dim, input_length = n_sentences * n_words ))
# Recreate the sentence shaped array
model.add(Reshape((n_sentences, n_words, embedding_dim))) 
# Encode each sentence - output shape is (batch_size, n_sentences, 88)
model.add(TimeDistributed(LSTM(88)))
# Go over lines and output hidden layer which contains info about previous sentences - output shape is (batch_size, n_sentences, hidden_dim)
model.add(LSTM(hidden_dim, return_sequence=True))
# Predict output binary class - output shape is (batch_size, n_sentences, 1)
model.add(TimeDistributed(Dense(1,activation='sigmoid')))
...

这应该是一个好的开始.

This should be a good start.

我希望这对您有帮助

这篇关于如何构造LSTM神经网络进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆