使用keras的句子相似度 [英] Sentence similarity using keras

查看:339
本文介绍了使用keras的句子相似度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试基于STS数据集的工作来实现句子相似性架构.标签是从0到1的归一化相似度评分,因此我们将其视为回归模型.

I'm trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model.

我的问题是,损失从第一个时期开始直接流向NaN.我在做什么错了?

My problem is that the loss goes directly to NaN starting from the first epoch. What am I doing wrong?

我已经尝试更新到最新的keras和theano版本.

I have already tried updating to latest keras and theano versions.

我的模型的代码是:

def create_lstm_nn(input_dim):
    seq = Sequential()`
    # embedd using pretrained 300d embedding
    seq.add(Embedding(vocab_size, emb_dim, mask_zero=True, weights=[embedding_weights]))
    # encode via LSTM
    seq.add(LSTM(128))
    seq.add(Dropout(0.3))
    return seq

lstm_nn = create_lstm_nn(input_dim)

input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))

processed_a = lstm_nn(input_a)
processed_b = lstm_nn(input_b)

cos_distance = merge([processed_a, processed_b], mode='cos', dot_axes=1)
cos_distance = Reshape((1,))(cos_distance)
distance = Lambda(lambda x: 1-x)(cos_distance)

model = Model(input=[input_a, input_b], output=distance)

# train
rms = RMSprop()
model.compile(loss='mse', optimizer=rms)
model.fit([X1, X2], y, validation_split=0.3, batch_size=128, nb_epoch=20)

我还尝试使用简单的Lambda而不是Merge层,但是结果相同.

I also tried using a simple Lambda instead of the Merge layer, but it has the same result.

def cosine_distance(vests):
    x, y = vests
    x = K.l2_normalize(x, axis=-1)
    y = K.l2_normalize(y, axis=-1)
    return -K.mean(x * y, axis=-1, keepdims=True)

def cos_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0],1)

distance = Lambda(cosine_distance, output_shape=cos_dist_output_shape)([processed_a, processed_b])

推荐答案

nan是深度学习回归中的常见问题.因为您使用的是暹罗网络,所以可以尝试以下操作:

The nan is a common issue in deep learning regression. Because you are using Siamese network, you can try followings:

  1. 检查您的数据:是否需要对其进行规范化?
  2. 尝试将Dense层作为最后一层添加到您的网络中,但要小心使用激活功能,例如relu
  3. 尝试使用其他损失函数,例如对比损失
  4. 降低学习速度,例如0.0001
  5. cos模式未仔细处理零除,可能是NaN的原因

要使深度学习完美地工作并不容易.

It is not easy to make deep learning work perfectly.

这篇关于使用keras的句子相似度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆