提高连体网络的准确性 [英] Improve Accuracy for a Siamese Network

查看:186
本文介绍了提高连体网络的准确性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Keras Functional API编写了这个小模型,以发现两个人之间对话的相似性.我正在使用Gensim的Doc2Vec嵌入将文本数据转换为向量(vocab大小:4117).我的数据平均分为56个阳性病例和64个阴性病例. (是的,我知道数据集很小-但这就是我目前所拥有的全部).

I wrote this little model using Keras Functional API to find similarity of a dialogue between two individuals. I am using Gensim's Doc2Vec embeddings for transforming text-data into vectors (vocab size: 4117). My data is equally divided up into 56 positive cases and 64 negative cases. (yes I know the dataset is small - but that's all I have for the time being).

def euclidean_distance(vects):
    x, y = vects
    sum_square = K.sum(K.square(x - y), axis=1, keepdims=True)
    return K.sqrt(K.maximum(sum_square, K.epsilon()))

ch_inp = Input(shape=(38, 200))
csr_inp = Input(shape=(38, 200))

inp = Input(shape=(38, 200))
net = Embedding(int(vocab_size), 16)(inp)
net = Conv2D(16, 1, activation='relu')(net)
net = TimeDistributed(LSTM(8, return_sequences=True))(net)
out = Activation('relu')(net)

sia = Model(inp, out)

x = sia(csr_inp)
y = sia(ch_inp)

sub = Subtract()([x, y])
mul = Multiply()([sub, sub])

mul_x = Multiply()([x, x])
mul_y = Multiply()([y, y])
sub_xy = Subtract()([x, y])

euc = Lambda(euclidean_distance)([x, y])
z = Concatenate(axis=-1)([euc, sub_xy, mul])
z = TimeDistributed(Bidirectional(LSTM(4)))(z)
z = Activation('relu')(z)
z = GlobalMaxPooling1D()(z)
z = Dense(2, activation='relu')(z)
out = Dense(1, activation = 'sigmoid')(z)

model = Model([ch_inp, csr_inp], out)
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

问题是:我的准确性从60.87%不会提高-我跑了10个纪元,并且准确性保持不变.我在代码中所做的某些事情导致了这一点?还是我的数据有问题?

The problem is: my accuracy won't improve from 60.87% - I ran 10 epochs and the accuracy remains constant. Is there something I've done here in my code that's causing that? Or perhaps its an issue with my data?

我还对某些Sklearn模型进行了K折验证,并从数据集中获得了以下结果:

I also did K-Fold Validation for some Sklearn models and got these results from the dataset:

此外,下面附有我的数据集概述:

Additionally, an overview of my dataset is attached below:

我肯定在为此苦苦挣扎-因此,这里的任何帮助将不胜感激.谢谢!

I'm definitely struggling with this one - so literally any help here would be appreciated. Thanks!

更新: 我将数据量增加到1875个火车样本.准确度提高到70.28%.但是它在所有迭代中仍然保持不变.

UPDATE: I increased my data-size to 1875 train-samples. Its accuracy improved to 70.28%. But its still constant over all iterations.

推荐答案

我在那看到了两点可能很重要.

I see two things that may be important there.

  • 您在LSTM之后使用'relu'. Keras中的LSTM已经具有'tanh'作为默认激活.因此,尽管您没有锁定模型,但是通过激活将结果限制在较小范围和减少负值的范围之间,激活它会使学习变得更加困难

  • You're using 'relu' after the LSTM. An LSTM in Keras already has 'tanh' as default activation. So, although you're not locking your model, you're making it harder for it to learn, with an activation that constraints the results between as small range plus one that cuts the negative values

您正在使用'relu'很少单位! Relu的单位少,初始化差,学习率高和运气不好,将卡在零范围内而没有任何梯度.

You're using 'relu' with very few units! Relu with few units, bad initialization, big learning rates and bad luck will get stuck in the zero region without any gradients.

如果您的损失完全冻结,则最有可能是由于上述第二点.而且即使不冻结,例如,它可能仅使用2个Dense单位中的一个,这会使图层非常差.

If your loss completely freezes, it's most probably due to the second point above. And even if it doesn't freeze, it may be using just one unit from the 2 Dense units, for instance, making the layer very poor.

您应该从下面做些事情:

You should do something from below:

  • 您的模型很小,因此请使用'relu'退出并改为使用'tanh'.这将为您的模型提供预期的功率.
  • 否则,您肯定应该增加LSTMDense的单位数量,因此'relu'不会轻易卡住.
  • 您可以在Dense之后和'relu' 前添加BatchNormalization,这样可以保证一定数量的单位始终大于零.
  • Your model is small, so quit using 'relu' and use 'tanh' instead. This will give your model the expected power it should have.
  • Otherwise, you should definitely increase the number of units, both for the LSTM and for the Dense, so 'relu' doesn't get easily stuck.
  • You can add a BatchNormalization layer after Dense and before 'relu', this way you guarantee that a good amount units will always be above zero.

在任何情况下,请勿在LSTM之后使用'relu'.

In any case, don't use 'relu' after the LSTM.

另一种方法是使模型更强大.

The other approach would be making the model more powerful.

例如:

z = TimeDistributed(Bidirectional(LSTM(4)))(z)
z = Conv1D(10, 3, activation = 'tanh')(z) #or 'relu' maybe
z = MaxPooling1D(z)
z = Conv1D(15, 3, activation = 'tanh')(z) #or 'relu' maybe
z = Flatten()(z) #unless the length is variable, then GlobalAveragePooling1D()(z)
z = Dense(10, activation='relu')(z)
out = Dense(1, activation = 'sigmoid')(z)

这篇关于提高连体网络的准确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆