如何冻结/锁定一个 TensorFlow 变量的权重(例如,一层的一个 CNN 内核) [英] How to freeze/lock weights of one TensorFlow variable (e.g., one CNN kernel of one layer)

查看:59
本文介绍了如何冻结/锁定一个 TensorFlow 变量的权重(例如,一层的一个 CNN 内核)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个性能良好的 TensorFlow CNN 模型,我们希望在硬件中实现该模型;即,FPGA.这是一个相对较小的网络,但如果它更小就更理想了.带着这个目标,我检查了内核,发现有一些权重非常强,还有一些根本没有做太多(内核值都接近于零).这特别发生在第 2 层,对应于名为W_conv2"的 tf.Variable().W_conv2 的形状为 [3, 3, 32, 32].我想冻结/锁定 W_conv2[:, :, 29, 13] 的值并将它们设置为零,以便可以训练网络的其余部分进行补偿.将此内核的值设置为零可有效地从硬件实现中删除/修剪内核,从而实现上述目标.

I have a TensorFlow CNN model that is performing well and we would like to implement this model in hardware; i.e., an FPGA. It's a relatively small network but it would be ideal if it were smaller. With that goal, I've examined the kernels and find that there are some where the weights are quite strong and there are others that aren't doing much at all (the kernel values are all close to zero). This occurs specifically in layer 2, corresponding to the tf.Variable() named, "W_conv2". W_conv2 has shape [3, 3, 32, 32]. I would like to freeze/lock the values of W_conv2[:, :, 29, 13] and set them to zero so that the rest of the network can be trained to compensate. Setting the values of this kernel to zero effectively removes/prunes the kernel from the hardware implementation thus achieving the goal stated above.

我发现了类似的问题和建议,这些建议通常围绕两种方法之一进行;

I have found similar questions with suggestions that generally revolve around one of two approaches;

建议#1:

    tf.Variable(some_initial_value, trainable = False)

实施此建议会冻结整个变量.我只想冻结一个切片,特别是 W_conv2[:, :, 29, 13].

Implementing this suggestion freezes the entire variable. I want to freeze just a slice, specifically W_conv2[:, :, 29, 13].

建议#2:

    Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list)

同样,实现这个建议不允许使用切片.例如,如果我尝试与我的既定目标相反(仅优化单个变量的单个内核),如下所示:

Again, implementing this suggestion does not allow the use of slices. For instance, if I try the inverse of my stated goal (optimize only a single kernel of a single variable) as follows:

    Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list = W_conv2[:,:,0,0]))

我收到以下错误:

    NotImplementedError: ('Trying to optimize unsupported type ', <tf.Tensor 'strided_slice_2228:0' shape=(3, 3) dtype=float32>)

无法以我在这里尝试的方式对 tf.Variables() 进行切片.我尝试过的唯一接近做我想做的事情是使用 .assign() ,但这是非常低效、繁琐和像穴居人一样的,因为我已经按如下方式实现了它(在模型训练后):

Slicing tf.Variables() isn't possible in the way that I've tried it here. The only thing that I've tried which comes close to doing what I want is using .assign() but this is extremely inefficient, cumbersome, and caveman-like as I've implemented it as follows (after the model is trained):

    for _ in range(10000):
        # get a new batch of data
        # reset the values of W_conv2[:,:,29,13]=0 each time through
        for m in range(3):
            for n in range(3):
                assign_op = W_conv2[m,n,29,13].assign(0)
                sess.run(assign_op)
        # re-train the rest of the network
        _, loss_val = sess.run([optimizer, loss], feed_dict = {
                                   dict_stuff_here
                               })
        print(loss_val)

该模型在 Keras 中启动,然后转移到 TensorFlow,因为 Keras 似乎没有实现预期结果的机制.我开始认为 TensorFlow 不允许修剪,但发现这很难相信;它只需要正确的实现.

The model was started in Keras then moved to TensorFlow since Keras didn't seem to have a mechanism to achieve the desired results. I'm starting to think that TensorFlow doesn't allow for pruning but find this hard to believe; it just needs the correct implementation.

推荐答案

一种可能的方法是用零初始化这些特定的权重,并修改最小化过程,使梯度不会应用于它们.可以通过以下方式替换对 minimize() 的调用:

A possible approach is to initialize these specific weights with zeros, and modify the minimization process such that gradients won't be applied to them. It can be done by replacing the call to minimize() with something like:

W_conv2_weights = np.ones((3, 3, 32, 32))
W_conv2_weights[:, :, 29, 13] = 0
W_conv2_weights_const = tf.constant(W_conv2_weights)

optimizer = tf.train.RMSPropOptimizer(0.001)

W_conv2_orig_grads = tf.gradients(loss, W_conv2)
W_conv2_grads = tf.multiply(W_conv2_weights_const, W_conv2_orig_grads)
W_conv2_train_op = optimizer.apply_gradients(zip(W_conv2_grads, W_conv2))

rest_grads = tf.gradients(loss, rest_of_vars)
rest_train_op = optimizer.apply_gradients(zip(rest_grads, rest_of_vars))

tf.group([rest_train_op, W_conv2_train_op])

  1. 准备一个常数张量来取消适当的梯度
  2. 仅计算 W_conv2 的梯度,然后将元素乘以常数 W_conv2_weights 以将适当的梯度归零,然后才应用梯度.
  3. 计算梯度并将其正常"应用于其余变量.
  4. 将 2 个训练操作组合为一个训练操作.
  1. Preparing a constant Tensor for canceling the appropriate gradients
  2. Compute gradients only for W_conv2, then multiply element-wise with the constant W_conv2_weights to zero the appropriate gradients and only then apply gradients.
  3. Compute and apply gradients "normally" to the rest of the variables.
  4. Group the 2 train ops to a single training op.

这篇关于如何冻结/锁定一个 TensorFlow 变量的权重(例如,一层的一个 CNN 内核)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆