如何在Keras中编码序列以对RNN进行排序? [英] How to code a sequence to sequence RNN in keras?

查看:241
本文介绍了如何在Keras中编码序列以对RNN进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个序列以在keras中对RNN进行序列化.我使用从网上了解到的代码编写了该程序.我首先将文本标记化,然后将其转换为序列,然后填充以形成特征变量 X .先将目标变量 x 向左移动然后填充,即可获得目标变量 Y .最后,我将特征和目标变量输入了我的LSTM模型.

I am trying to write a sequence to sequence RNN in keras. I coded this program using what I understood from the web. I first tokenized the text then converted the text into sequence and padded to form feature variable X. The target variable Y was obtained first shifting x to left and then padding it. Lastly I fed my feature and target variable to my LSTM model.

这是我为此目的在keras中编写的代码.

This is my code I written in keras for that purpose.

from keras.preprocessing.text import Tokenizer,base_filter
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Activation,Dropout,Embedding
from keras.layers import LSTM


def shift(seq, n):
    n = n % len(seq)
    return seq[n:] + seq[:n]

txt="abcdefghijklmn"*100

tk = Tokenizer(nb_words=2000, filters=base_filter(), lower=True, split=" ")
tk.fit_on_texts(txt)
x = tk.texts_to_sequences(txt)
#shifing to left
y = shift(x,1)

#padding sequence
max_len = 100
max_features=len(tk.word_counts)
X = pad_sequences(x, maxlen=max_len)
Y = pad_sequences(y, maxlen=max_len)

#lstm model
model = Sequential()
model.add(Embedding(max_features, 128, input_length=max_len, dropout=0.2))
model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2))
model.add(Dense(max_len))
model.add(Activation('softmax'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop')

model.fit(X, Y, batch_size=200, nb_epoch=10)

问题在于它显示了一个错误

The problem is its showing an error

Epoch 1/10
IndexError: index 14 is out of bounds for size 14
Apply node that caused the error: AdvancedSubtensor1(if{inplace}.0, Reshape{1}.0)
Toposort index: 80

推荐答案

问题出在:

model.add(Embedding(max_features, 128, input_length=max_len, dropout=0.2))

在嵌入文档中,您可能会看到提供给它的第一个参数应设置为size词汇+1.这是因为null单词应该始终存在一个索引为0的地方.因此,您需要将此行更改为:

In the Embedding documentation you may see that the first argument provided to it should be set to size of vocabulary + 1. It's because there should be always a place for a null word which index is 0. Because of that you need to change this line to:

model.add(Embedding(max_features + 1, 128, input_length=max_len, dropout=0.2))

这篇关于如何在Keras中编码序列以对RNN进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆