Seq2Seq模型学习几次迭代后仅输出EOS令牌(< \ s>) [英] Seq2Seq model learns to only output EOS token (<\s>) after a few iterations

查看:193
本文介绍了Seq2Seq模型学习几次迭代后仅输出EOS令牌(< \ s>)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个在 Cornell电影对话语料库使用 NMT .

I am creating a chatbot trained on Cornell Movie Dialogs Corpus using NMT.

我的代码部分基于 https://github.com/bshao001/ChatLearner https://github.com/chiphuyen/stanford-tensorflow -tutorials/tree/master/assignments/chatbot

在训练过程中,我打印了从批处理中馈送到解码器的随机输出答案,以及我的模型预测观察到学习进度的相应答案.

During training, I print a random output answer fed to the decoder from the batch and the corresponding answer that my model predicts to observe the learning progress.

我的问题:经过大约4次迭代训练,该模型学会了在每个时间步输出EOS令牌(<\s>).即使训练继续进行,它也始终将其输出作为其响应(由logits的argmax确定).该模型偶尔会偶尔输出一系列周期作为其答案.

My issue: After only about 4 iterations of training, the model learns to output the EOS token (<\s>) for every timestep. It always outputs that as its response (determined using argmax of logits) even as training continues. Once in a while, rarely, the model outputs series of periods as its answer.

我还在训练过程中打印了前10个logit值(不仅仅是argmax),以查看其中是否有正确的单词,但它似乎在预测词汇中最常见的单词(例如,i,您, ?,.).在培训期间,即使是前10个字词也没有太大变化.

I also print the top 10 logit values during training (not just the argmax) to see if maybe the correct word is somewhere in there, but it seems to be predicting the most common words in the vocab (e.g i, you, ?, .). Even these top 10 words don't change much during training.

我确保正确计算编码器和解码器的输入序列长度,并相应地添加了SOS(<s>)和EOS(也用于填充)令牌.我还在损失计算中执行掩蔽.

I have made sure to correctly count input sequence lengths for encoder and decoder, and added SOS (<s>) and EOS (also used for padding) tokens accordingly. I also perform masking in the loss calculation.

以下是示例输出:

培训迭代1:

Decoder Input: <s> sure . sure . <\s> <\s> <\s> <\s> <\s> <\s> <\s> 
<\s> <\s>
Predicted Answer: wildlife bakery mentality mentality administration 
administration winston winston winston magazines magazines magazines 
magazines

...

训练迭代4:

Decoder Input: <s> i guess i had it coming . let us call it settled . 
<\s> <\s> <\s> <\s> <\s>
Predicted Answer: <\s> <\s> <\s> <\s> <\s> <\s> <\s> <\s> <\s> <\s> 
<\s> <\s> <\s> <\s> <\s> <\s> <\s> <\s>


再经过几次迭代后,它只依靠预测EOS(很少出现周期)


After a few more iterations, it settles on only predicting EOS (and rarely some periods)

我不确定是什么原因导致了此问题,并且已经在此问题上停留了一段时间.任何帮助将不胜感激!

I am not sure what could be causing this issue and have been stuck on this for a while. Any help would be greatly appreciated!

更新:我让它进行了十万次迭代训练,但它仍然仅输出EOS(偶尔出现).经过几次迭代后,训练损失也不会减少(从一开始就保持在47左右)

Update: I let it train for over a hundred thousand iterations and it still only outputs EOS (and occasional periods). The training loss also does not decrease after a few iteration (it remains at around 47 from the beginning)

推荐答案

最近,我还在seq2seq模型上工作. 在遇到我的情况之前,我是通过更改损失函数来解决问题的.

recently I also work on seq2seq model. I have encountered your problem before, in my case, I solve it by changing the loss function.

您说您使用口罩,所以我想您像以前一样使用tf.contrib.seq2seq.sequence_loss.

You said you use mask, so I guess you use tf.contrib.seq2seq.sequence_loss as I did.

我更改为tf.nn.softmax_cross_entropy_with_logits,它可以正常工作(并且计算成本更高).

I changed to tf.nn.softmax_cross_entropy_with_logits, and it works normally (and higher computation cost).

(编辑05/10/2018.对不起,我发现我的代码中存在严重错误,因此我需要进行编辑)

(Edit 05/10/2018. Pardon me, I need to edit since I found there is an egregious mistake in my code)

tf.contrib.seq2seq.sequence_loss确实可以很好地工作. 根据官方文件中的定义: tf.contrib.seq2seq.sequence_loss

tf.contrib.seq2seq.sequence_loss can work really well, if the shape of logits ,targets , mask are right. As defined in official document : tf.contrib.seq2seq.sequence_loss

loss=tf.contrib.seq2seq.sequence_loss(logits=decoder_logits,
                                      targets=decoder_targets,
                                      weights=masks) 

#logits:  [batch_size, sequence_length, num_decoder_symbols]  
#targets: [batch_size, sequence_length] 
#weights: [batch_size, sequence_length] 

好吧,即使形状不符合,它仍然可以工作.但是结果可能很奇怪(很多#EOS #PAD ...等).

Well, it can still work even if the shape are not meet. But the result could be weird (lots of #EOS #PAD... etc).

由于decoder_outputsdecoder_targets可能具有与所需形状相同的形状(在我的情况下,我的decoder_targets具有形状[sequence_length, batch_size]). 因此,尝试使用tf.transpose帮助您重塑张量.

Since the decoder_outputs, and the decoder_targets might have the same shape as required ( In my case, my decoder_targets has the shape [sequence_length, batch_size] ). So try to use tf.transpose to help you reshape the tensor.

这篇关于Seq2Seq模型学习几次迭代后仅输出EOS令牌(&lt; \ s&gt;)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆