如何避免在简单的前馈网络上过度拟合 [英] How to avoid overfitting on a simple feed forward network

查看:89
本文介绍了如何避免在简单的前馈网络上过度拟合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用美洲印第安人糖尿病数据集使用Keras建立准确的模型.我编写了以下代码:

Using the pima indians diabetes dataset I'm trying to build an accurate model using Keras. I've written the following code:

# Visualize training history
from keras import callbacks
from keras.layers import Dropout

tb = callbacks.TensorBoard(log_dir='/.logs', histogram_freq=10, batch_size=32,
                           write_graph=True, write_grads=True, write_images=False,
                           embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)
# Visualize training history
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
import numpy

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu', name='first_input'))
model.add(Dense(500, activation='tanh', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))

# Compile model
model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Fit the model
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb])
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

经过几次尝试,为了避免过拟合,我添加了辍学图层,但是没有运气.下图显示了验证损失和训练损失在一个点上是分开的.

After several tries, I've added dropout layers in order to avoid overfitting, but with no luck. The following graph shows that the validation loss and training loss gets separate at one point.

我还可以做些什么来优化此网络?

What else could I do to optimize this network?

更新: 根据我得到的评论,我对代码进行了如下调整:

UPDATE: based on the comments I got I've tweaked the code like so:

model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', kernel_regularizer=regularizers.l2(0.01),
                activity_regularizer=regularizers.l1(0.01), activation='relu',
                name='first_input'))  # added regularizers
model.add(Dense(8, activation='relu', name='first_hidden'))  # reduced to 8 neurons
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(5, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))

这是500个历元的图

推荐答案

第一个示例的验证准确度> 75%,第二个示例的准确度< 65%,如果您比较低于100的时期的损失,则其小于<第一个为0.5,第二个为> 0.6.但是第二种情况更好吗?

The first example gave a validation accuracy > 75% and the second one gave an accuracy of < 65% and if you compare the losses for epochs below 100, its less than < 0.5 for the first one and the second one was > 0.6. But how is the second case better?.

给我的第二个问题是under-fitting的情况:该模型没有足够的学习能力.第一种情况有over-fitting的问题,因为开始过度拟合时(early stopping)训练没有停止.如果训练在大约100个纪元时停止,那么与两者之间相比,这将是一个更好的模型.

The second one to me is a case of under-fitting: the model doesnt have enough capacity to learn. While the first case has a problem of over-fitting because its training was not stopped when overfitting started (early stopping). If the training was stopped at say 100 epoch, it would be a far better model compared between the two.

目标应该是在看不见的数据中获得较小的预测误差,并为此增加网络的容量,直到超出该点开始发生过度拟合为止.

The goal should be to obtain small prediction error in unseen data and for that you increase the capacity of the network till a point beyond which overfitting starts to happen.

那么在这种特殊情况下如何避免over-fitting?采用early stopping.

So how to avoid over-fitting in this particular case? Adopt early stopping.

代码更改:包括early stoppinginput scaling.

 # input scaling
 scaler = StandardScaler()
 X = scaler.fit_transform(X)

 # Early stopping  
 early_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=1, mode='auto')

 # create model - almost the same code
 model = Sequential()
 model.add(Dense(12, input_dim=8, activation='relu', name='first_input'))
 model.add(Dense(500, activation='relu', name='first_hidden'))
 model.add(Dropout(0.5, name='dropout_1'))
 model.add(Dense(8, activation='relu', name='second_hidden'))
 model.add(Dense(1, activation='sigmoid', name='output_layer')))

 history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb, early_stop])

Accuracyloss图:

这篇关于如何避免在简单的前馈网络上过度拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆