具有LSTM网络的暹罗模型无法使用张量流进行训练 [英] Siamese Model with LSTM network fails to train using tensorflow

查看:54
本文介绍了具有LSTM网络的暹罗模型无法使用张量流进行训练的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据集说明

数据集包含一组问题对和一个标签,用于告知问题是否相同.例如

The dataset contains a set of question pairs and a label which tells if the questions are same. e.g.

如何阅读和查找我的YouTube评论?" ,我怎么能看到我所有的 YouTube评论?","1"

"How do I read and find my YouTube comments?" , "How can I see all my Youtube comments?" , "1"

该模型的目标是识别给定的问题对是否相同或不同.

The goal of the model is to identify if the given question pair is same or different.

方法

我创建了一个

I have created a Siamese network to identify if two questions are same. Following is the model:

graph = tf.Graph()

with graph.as_default():
    embedding_placeholder = tf.placeholder(tf.float32, shape=embedding_matrix.shape, name='embedding_placeholder')
    with tf.variable_scope('siamese_network') as scope:
        labels = tf.placeholder(tf.int32, [batch_size, None], name='labels')
        keep_prob = tf.placeholder(tf.float32, name='question1_keep_prob')

        with tf.name_scope('question1') as question1_scope:
            question1_inputs = tf.placeholder(tf.int32, [batch_size, seq_len], name='question1_inputs')

            question1_embedding = tf.get_variable(name='embedding', initializer=embedding_placeholder, trainable=False)
            question1_embed = tf.nn.embedding_lookup(question1_embedding, question1_inputs)

            question1_lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
            question1_drop = tf.contrib.rnn.DropoutWrapper(question1_lstm, output_keep_prob=keep_prob)
            question1_multi_lstm = tf.contrib.rnn.MultiRNNCell([question1_drop] * lstm_layers)

            q1_initial_state = question1_multi_lstm.zero_state(batch_size, tf.float32)

            question1_outputs, question1_final_state = tf.nn.dynamic_rnn(question1_multi_lstm, question1_embed, initial_state=q1_initial_state)

        scope.reuse_variables()

        with tf.name_scope('question2') as question2_scope:
            question2_inputs = tf.placeholder(tf.int32, [batch_size, seq_len], name='question2_inputs')

            question2_embedding = question1_embedding
            question2_embed = tf.nn.embedding_lookup(question2_embedding, question2_inputs)

            question2_lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
            question2_drop = tf.contrib.rnn.DropoutWrapper(question2_lstm, output_keep_prob=keep_prob)
            question2_multi_lstm = tf.contrib.rnn.MultiRNNCell([question2_drop] * lstm_layers)

            q2_initial_state = question2_multi_lstm.zero_state(batch_size, tf.float32)

            question2_outputs, question2_final_state = tf.nn.dynamic_rnn(question2_multi_lstm, question2_embed, initial_state=q2_initial_state)

使用RNN输出计算余弦距离:

Calculate the cosine distance using the RNN outputs:

with graph.as_default():
    diff = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(question1_outputs[:, -1, :], question2_outputs[:, -1, :])), reduction_indices=1))

    margin = tf.constant(1.) 
    labels = tf.to_float(labels)
    match_loss = tf.expand_dims(tf.square(diff, 'match_term'), 0)
    mismatch_loss = tf.expand_dims(tf.maximum(0., tf.subtract(margin, tf.square(diff)), 'mismatch_term'), 0)

    loss = tf.add(tf.matmul(labels, match_loss), tf.matmul((1 - labels), mismatch_loss), 'loss_add')
    distance = tf.reduce_mean(loss)

    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(distance)

以下是训练模型的代码:

Following is the code to train the model:

with graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer(), feed_dict={embedding_placeholder: embedding_matrix})

    iteration = 1
    for e in range(epochs):
        summary_writer = tf.summary.FileWriter('/Users/mithun/projects/kaggle/quora_question_pairs/logs', sess.graph)
        summary_writer.add_graph(sess.graph)

        for ii, (x1, x2, y) in enumerate(get_batches(question1_train, question2_train, label_train, batch_size), 1):
            feed = {question1_inputs: x1,
                    question2_inputs: x2,
                    labels: y[:, None],
                    keep_prob: 0.9
                   }
            loss1 = sess.run([distance], feed_dict=feed)

            if iteration%5==0:
                print("Epoch: {}/{}".format(e, epochs),
                      "Iteration: {}".format(iteration),
                      "Train loss: {:.3f}".format(loss1))

            if iteration%50==0:
                val_acc = []
                for x1, x2, y in get_batches(question1_val, question2_val, label_val, batch_size):
                    feed = {question1_inputs: x1,
                            question2_inputs: x2,
                            labels: y[:, None],
                            keep_prob: 1
                           }
                    batch_acc = sess.run([accuracy], feed_dict=feed)
                    val_acc.append(batch_acc)
                print("Val acc: {:.3f}".format(np.mean(val_acc)))
            iteration +=1

    saver.save(sess, "checkpoints/quora_pairs.ckpt")

我已经用大约10,000个标记数据训练了上述模型.但是,精度停滞在0.630左右,而且奇怪的是,所有迭代的验证精度都是相同的.

I have trained the above model with about 10,000 labeled data. But, the accuracy is stagnant at around 0.630 and strangely the validation accuracy is same across all the iterations.

lstm_size = 64
lstm_layers = 1
batch_size = 128
learning_rate = 0.001

我创建模型的方式有什么问题吗?

Is there anything wrong with the way I have created the model?

推荐答案

这是不平衡数据集(例如您正在使用的最新发布的Quora数据集)的普遍问题.由于Quora数据集不平衡(约63%的负样本和约37%的正样本),因此您需要适当地初始化权重.没有权重初始化,您的解决方案将被困在局部最小值中,并且将仅预测否定类.因此准确度为63%,因为那是您的验证数据中不相似"问题的百分比.如果检查在验证集上获得的结果,您会注意到它会预测所有零. He等人在 http://arxiv.org/abs/1502.01852 中提出的截断正态分布是初始化权重的好选择.

This is a common problem with imbalanced datasets like the recently released Quora dataset which you are using. Since the Quora dataset is imbalanced (~63% negative and ~37% positive examples) you need proper initialization of weights. Without weight initialization your solution will be stuck in a local minima and it will train to predict only the negative class. Hence the 63% accuracy, because that is the percentage of 'not similar' questions in your validation data. If you check the results obtained on your validation set you will notice that it predicts all zeros. A truncated normal distribution proposed in He et al., http://arxiv.org/abs/1502.01852 is a good alternate for initializing the weights.

这篇关于具有LSTM网络的暹罗模型无法使用张量流进行训练的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆