在没有高级API的情况下重新训练CNN [英] Retraining a CNN without a high-level API

查看:93
本文介绍了在没有高级API的情况下重新训练CNN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

摘要:我试图在不使用高级API的情况下为MNIST重新训练一个简单的CNN.我已经通过重新培训整个网络而成功地做到了这一点,但是我目前的目标是仅重新培训最后一层或两层全连接"层.

Summary: I am trying to retrain a simple CNN for MNIST without using a high-level API. I already succeeded doing so by retraining the entire network, but my current goal is to retrain only the last one or two Fully Connected layers.

目前为止的工作: 假设我有一个具有以下结构的CNN

Work so far: Say I have a CNN with the following structure

  • 卷积层
  • RELU
  • 池层
  • 卷积层
  • RELU
  • 池层
  • 完全连接的层
  • RELU
  • 退出层
  • 完全连接到10个输出类别的层
  • Convolutional Layer
  • RELU
  • Pooling Layer
  • Convolutional Layer
  • RELU
  • Pooling Layer
  • Fully Connected Layer
  • RELU
  • Dropout Layer
  • Fully Connected Layer to 10 output classes

我的目标是重新训练最后一个完全连接层或最后两个完全连接层.

My goal is to retrain either the last Fully Connected Layer or the last two Fully Connected Layers.

卷积层的示例:

W_conv1 = tf.get_variable("W", [5, 5, 1, 32],
      initializer=tf.truncated_normal_initializer(stddev=np.sqrt(2.0 / 784)))
b_conv1 = tf.get_variable("b", initializer=tf.constant(0.1, shape=[32]))
z = tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME')
z += b_conv1
h_conv1 = tf.nn.relu(z + b_conv1)

完全连接层的示例:

input_size = 7 * 7 * 64
W_fc1 = tf.get_variable("W", [input_size, 1024], initializer=tf.truncated_normal_initializer(stddev=np.sqrt(2.0/input_size)))
b_fc1 = tf.get_variable("b", initializer=tf.constant(0.1, shape=[1024]))
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

我的假设:在新数据集上进行反向传播时,我只需确保将权重W和b(来自W * x + b)固定在未完全连接的图层中.

My assumption: When performing the backpropagation on the new dataset, I simply make sure that my weights W and b (from W*x+b) are fixed in the non-fully connected layers.

关于如何执行此操作的第一个想法:保存W和b,执行向后传播步骤,然后在我不想更改的图层中用旧的W和b替换新的W和b .

A first thought on how to do this: Save the W and b, perform a backpropagation step, and replace the new W and b with the old one in the layers I don't want changed.

我对第一种方法的想法:

  • 这是计算密集型工作,浪费了内存.只做最后一层的全部好处就是不必去做其他的事情
  • 如果不应用于所有图层,反向传播功能可能会有所不同?

我的问题 :

My question:

  • 当不使用高级API时,如何正确地训练神经网络中的特定层.无论是概念上的还是编码上的答案都是受欢迎的.

PS .完全了解如何使用高级API做到这一点.示例: https://towardsdatascience.com/how-to-训练您的模型的速度更快9ad063f0f718 .只是不想让神经网络变得神奇,我想知道实际发生的事情

P.S. Fully aware how one can do it using high-level APIs. Example: https://towardsdatascience.com/how-to-train-your-model-dramatically-faster-9ad063f0f718. Just don't want Neural Networks to be magic, I want to know what actually happens

推荐答案

优化器的Minimal函数具有一个可选参数,用于选择要训练的变量,例如:

The minimize function of optimizers has an optional argument for choosing which variables to train, e.g.:

optimizer_step = tf.train.MomentumOptimizer(learning_rate, momentum, name='MomentumOptimizer').minimize(loss, var_list=training_variables)

您可以使用tf.trainable_variables()获得要训练的图层的变量:

You can get the variables for the layers you want to train by using tf.trainable_variables():

vars1 = tf.trainable_variables()

# FC Layer
input_size = 7 * 7 * 64
W_fc1 = tf.get_variable("W", [input_size, 1024], initializer=tf.truncated_normal_initializer(stddev=np.sqrt(2.0/input_size)))
b_fc1 = tf.get_variable("b", initializer=tf.constant(0.1, shape=[1024]))
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

vars2 = tf.trainable_variables()

training_variables = list(set(vars2) - set(vars1))

实际上,在这种情况下使用tf.trainable_variables可能会过大,因为您直接拥有W_fc1和b_fc1.例如,如果您使用tf.layers.dense来创建一个密集层,而在该层中您没有明确的变量,则这将很有用.

actually, using tf.trainable_variables is probably overkill in this case, since you have W_fc1 and b_fc1 directly. This would be useful for example if you had used tf.layers.dense to create a dense layer, where you would not have the variables explicitly.

这篇关于在没有高级API的情况下重新训练CNN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆