带有Keras的单词级Seq2Seq [英] Word-level Seq2Seq with Keras
问题描述
I was following the Keras Seq2Seq tutorial, and wit works fine. However, this is a character-level model, and I would like to adopt it to a word-level model. The authors even include a paragraph with require changes but all my current attempts result in an error regarding wring dimensions.
如果您遵循字符级模型,则输入数据为3个暗淡部分:#sequences
,#max_seq_len
,#num_char
,因为每个字符都是一次热编码的.当我按照本教程中的说明绘制模型的摘要时,我得到:
If you follow the character-level model, the input data is of 3 dims: #sequences
, #max_seq_len
, #num_char
since each character is one-hot encoded. When I plot the summary for the model as used in the tutorial, I get:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, None, 71) 0
_____________________________________________________________________________ __________________
input_2 (InputLayer) (None, None, 94) 0
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, 256), (None, 335872 input_1[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) [(None, None, 256), 359424 input_2[0][0]
lstm_1[0][1]
lstm_1[0][2]
__________________________________________________________________________________________________
dense_1 (Dense) (None, None, 94) 24158 lstm_2[0][0]
==================================================================================================
这样编译和训练就可以了.
This compiles and trains just fine.
现在,本教程的内容为如果我想对整数序列使用单词级模型该怎么办?"而且我尝试遵循这些更改.首先,我使用单词索引对所有序列进行编码.这样,输入和目标数据现在是2个暗淡的:#sequences
,#max_seq_len
,因为我不再进行单次热编码,而是现在使用嵌入层.
Now this tutorial has section "What if I want to use a word-level model with integer sequences?" And I've tried to follow those changes. Firstly, I encode all sequences using a word index. As such, the input and target data is now 2 dims: #sequences
, #max_seq_len
since I no longer one-hot encode but use now Embedding layers.
encoder_input_data_train.shape => (90000, 9)
decoder_input_data_train.shape => (90000, 16)
decoder_target_data_train.shape => (90000, 16)
例如,一个序列可能看起来像这样:
For example, a sequence might look like this:
[ 826. 288. 2961. 3127. 1260. 2108. 0. 0. 0.]
当我使用列出的代码时:
When I use the listed code:
# encoder
encoder_inputs = Input(shape=(None, ))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]
# decoder
decoder_inputs = Input(shape=(None,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
模型将编译并如下所示:
the model compiles and looks like this:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_35 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
input_36 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
embedding_32 (Embedding) (None, None, 256) 914432 input_35[0][0]
__________________________________________________________________________________________________
embedding_33 (Embedding) (None, None, 256) 914432 input_36[0][0]
__________________________________________________________________________________________________
lstm_32 (LSTM) [(None, 256), (None, 525312 embedding_32[0][0]
__________________________________________________________________________________________________
lstm_33 (LSTM) (None, None, 256) 525312 embedding_33[0][0]
lstm_32[0][1]
lstm_32[0][2]
__________________________________________________________________________________________________
dense_21 (Dense) (None, None, 3572) 918004 lstm_33[0][0]
在编译,培训的同时
model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=32, epochs=1, validation_split=0.2)
失败,并出现以下错误:ValueError: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (90000, 16)
,后者是解码器输入/目标的形状.为什么Dense
层是解码器输入数据形状的数组?
fails with the following error: ValueError: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (90000, 16)
with the latter being the shape of the decoder input/target. Why does the Dense
layer an array of the shape of the decoder input data?
我尝试过的事情:
- 我发现解码器LSTM具有
return_sequences=True
有点奇怪,因为我认为我无法给Dense
层提供序列(并且原始字符级模型的解码器未说明这一点).但是,简单地删除或设置return_sequences=False
并没有帮助.当然,Dense
层现在的输出形状为(None, 3572)
. - 我不太需要
Input
层.我将它们分别设置为shape=(max_input_seq_len, )
和shape=(max_target_seq_len, )
,以便摘要不显示(None, None)
,而是显示各自的值,例如(None, 16)
.没变化. - 在 Keras文档中,我读到
input_length
应该使用嵌入层,否则上游的Dense
层无法计算其输出.但是同样,当我相应地设置input_length
时,仍然会出错.
- I find it a bit strange that the decoder LSTM has a
return_sequences=True
since I thought I cannot give a sequences to aDense
layer (and the decoder of the original character-level model does not state this). However, simply removing or settingreturn_sequences=False
did not help. Of course, theDense
layer now has an output shape of(None, 3572)
. - I don' quite get the need for the
Input
layers. I've set them toshape=(max_input_seq_len, )
andshape=(max_target_seq_len, )
respectively so that the summary doesn't show(None, None)
but the respective values, e.g.,(None, 16)
. No change. - In the Keras Docs I've read that an Embedding layer should be used with
input_length
, otherwise aDense
layer upstream cannot compute its outputs. But again, still errors when I setinput_length
accordingly.
我有点陷入僵局了吧?我是否在正确的道路上,还是从根本上错过了一些东西?我的数据形状不对吗?为什么最后一个Dense
层获得形状为(90000, 16)
的数组?看来还不错.
I'm a bit at a deadlock right? Am I even on the right track or do I missing something more fundamentally. Is the shape of my data wrong? Why does the last Dense
layer get array with shape (90000, 16)
? That seems rather off.
更新:我发现问题似乎出在decoder_target_data
上,该形状当前具有(#sample, max_seq_len)
形状,例如(90000, 16)
.但是我假设我需要针对词汇表对目标输出进行一次热编码:(#sample, max_seq_len, vocab_size)
,例如,(90000, 16, 3572)
.
UPDATE: I figured out that the problem seems to be decoder_target_data
which currently has the shape (#sample, max_seq_len)
, e.g., (90000, 16)
. But I assume I need to one-hot encode the target output with respect to the vocabulary: (#sample, max_seq_len, vocab_size)
, e.g., (90000, 16, 3572)
.
不幸的是,这引发了Memory
错误.但是,当我出于调试目的进行操作时,即假设词汇量为10:
Unfortunately, this throws a Memory
error. However, when I do for debugging purposes, i.e., assume a vocabulary size of 10:
decoder_target_data = np.zeros((len(input_sequences), max_target_seq_len, 10), dtype='float32')
,然后在解码器模型中:
and later in the decoder model:
x = Dense(10, activation='softmax')(x)
然后模型训练无误.万一这确实是我的问题,我必须使用手动生成的批次来训练模型,以便我可以保留词汇量,但可以将#samples
减小为例如每个形状为(1000, 16, 3572)
的90个批次.我在这里吗?
then the model trains without error. In case that's indeed my issue, I have to train the model with manually generate batches so I can keep the vocabulary size but reduce the #samples
, e.g., to 90 batches each of shape (1000, 16, 3572)
. Am I on the right track here?
推荐答案
最近我也遇到了这个问题.没有其他解决方案,然后在generator
中创建小批次,例如batch_size=64
,然后代替model.fit
执行model.fit_generator
.我在下面附加了generate_batch
代码:
Recently I was also facing this problem. There is no other solution then creating small batches say batch_size=64
in a generator
and then instead of model.fit
do model.fit_generator
. I have attached my generate_batch
code below:
def generate_batch(X, y, batch_size=64):
''' Generate a batch of data '''
while True:
for j in range(0, len(X), batch_size):
encoder_input_data = np.zeros((batch_size, max_encoder_seq_length),dtype='float32')
decoder_input_data = np.zeros((batch_size, max_decoder_seq_length+2),dtype='float32')
decoder_target_data = np.zeros((batch_size, max_decoder_seq_length+2, num_decoder_tokens),dtype='float32')
for i, (input_text_seq, target_text_seq) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
for t, word_index in enumerate(input_text_seq):
encoder_input_data[i, t] = word_index # encoder input seq
for t, word_index in enumerate(target_text_seq):
decoder_input_data[i, t] = word_index
if (t>0)&(word_index<=num_decoder_tokens):
decoder_target_data[i, t-1, word_index-1] = 1.
yield([encoder_input_data, decoder_input_data], decoder_target_data)
然后像这样进行训练:
batch_size = 64
epochs = 2
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(
generator=generate_batch(X=X_train_sequences, y=y_train_sequences, batch_size=batch_size),
steps_per_epoch=math.ceil(len(X_train_sequences)/batch_size),
epochs=epochs,
verbose=1,
validation_data=generate_batch(X=X_val_sequences, y=y_val_sequences, batch_size=batch_size),
validation_steps=math.ceil(len(X_val_sequences)/batch_size),
workers=1,
)
X_train_sequences
是[[23,34,56], [2, 33544, 6, 10]]
之类的列表的列表.
其他人也一样.
X_train_sequences
is list of lists like [[23,34,56], [2, 33544, 6, 10]]
.
Similarly others.
Also took help from this blog - word-level-english-to-marathi-nmt
这篇关于带有Keras的单词级Seq2Seq的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!