我可以在反向传播过程中(选择性地)反转Theano梯度吗? [英] Can I (selectively) invert Theano gradients during backpropagation?

查看:146
本文介绍了我可以在反向传播过程中(选择性地)反转Theano梯度吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我热衷于利用最近发表的论文"通过反向传播进行的无监督域自适应"在Lasagne/Theano框架中.

I'm keen to make use of the architecture proposed in the recent paper "Unsupervised Domain Adaptation by Backpropagation" in the Lasagne/Theano framework.

与本文不同的是,它包含一个梯度反转层",可以在反向传播过程中反转梯度:

The thing about this paper that makes it a bit unusual is that it incorporates a 'gradient reversal layer', which inverts the gradient during backpropagation:

(图像底部的箭头是反向传播,其梯度反转了.)

(The arrows along the bottom of the image are the backpropagations which have their gradient inverted).

在论文中,作者声称该方法可以使用任何深度学习包来实现",并且确实提供了在Caffe中制作的版本.

In the paper the authors claim that the approach "can be implemented using any deep learning package", and indeed they provide a version made in caffe.

但是,出于各种原因,我正在使用Lasagne/Theano框架.

However, I'm using the Lasagne/Theano framework, for various reasons.

是否有可能在Lasagne/Theano中创建这样的渐变反转层?我还没有看到任何可以在其中将自定义标量变换应用于渐变的示例.如果是这样,我可以通过在Lasagne中创建自定义图层来做到这一点吗?

Is it possible to create such a gradient reversal layer in Lasagne/Theano? I haven't seen any examples of where one can apply custom scalar transforms to gradients like this. If so, can I do it by creating a custom layer in Lasagne?

推荐答案

这里是使用纯Theano的草图实现.这可以很容易地集成到千层面中.

Here's a sketch implementation using plain Theano. This can be integrated into Lasagne easily enough.

您需要创建一个自定义操作,该操作在前向传递中充当标识操作,但在后向传递中反转梯度.

You need to create a custom operation which acts as an identity operation in the forward pass but reverses the gradient in the backward pass.

这是关于如何实施的建议.它未经测试,并且我不确定100%正确地理解了所有内容,但是您可以根据需要进行验证和修复.

Here's a suggestion for how that could be implemented. It is not tested and I'm not 100% certain I've understood everything correctly, but you may be able to verify and fix as required.

class ReverseGradient(theano.gof.Op):
    view_map = {0: [0]}

    __props__ = ('hp_lambda',)

    def __init__(self, hp_lambda):
        super(ReverseGradient, self).__init__()
        self.hp_lambda = hp_lambda

    def make_node(self, x):
        return theano.gof.graph.Apply(self, [x], [x.type.make_variable()])

    def perform(self, node, inputs, output_storage):
        xin, = inputs
        xout, = output_storage
        xout[0] = xin

    def grad(self, input, output_gradients):
        return [-self.hp_lambda * output_gradients[0]]

使用纸符号和命名约定,这是他们提出的完整通用模型的简单Theano实现.

Using the paper notation and naming conventions, here's a simple Theano implementation of the complete general model they propose.

import numpy
import theano
import theano.tensor as tt


def g_f(z, theta_f):
    for w_f, b_f in theta_f:
        z = tt.tanh(theano.dot(z, w_f) + b_f)
    return z


def g_y(z, theta_y):
    for w_y, b_y in theta_y[:-1]:
        z = tt.tanh(theano.dot(z, w_y) + b_y)
    w_y, b_y = theta_y[-1]
    z = tt.nnet.softmax(theano.dot(z, w_y) + b_y)
    return z


def g_d(z, theta_d):
    for w_d, b_d in theta_d[:-1]:
        z = tt.tanh(theano.dot(z, w_d) + b_d)
    w_d, b_d = theta_d[-1]
    z = tt.nnet.sigmoid(theano.dot(z, w_d) + b_d)
    return z


def l_y(z, y):
    return tt.nnet.categorical_crossentropy(z, y).mean()


def l_d(z, d):
    return tt.nnet.binary_crossentropy(z, d).mean()


def mlp_parameters(input_size, layer_sizes):
    parameters = []
    previous_size = input_size
    for layer_size in layer_sizes:
        parameters.append((theano.shared(numpy.random.randn(previous_size, layer_size).astype(theano.config.floatX)),
                           theano.shared(numpy.zeros(layer_size, dtype=theano.config.floatX))))
        previous_size = layer_size
    return parameters, previous_size


def compile(input_size, f_layer_sizes, y_layer_sizes, d_layer_sizes, hp_lambda, hp_mu):
    r = ReverseGradient(hp_lambda)

    theta_f, f_size = mlp_parameters(input_size, f_layer_sizes)
    theta_y, _ = mlp_parameters(f_size, y_layer_sizes)
    theta_d, _ = mlp_parameters(f_size, d_layer_sizes)

    xs = tt.matrix('xs')
    xs.tag.test_value = numpy.random.randn(9, input_size).astype(theano.config.floatX)
    xt = tt.matrix('xt')
    xt.tag.test_value = numpy.random.randn(10, input_size).astype(theano.config.floatX)
    ys = tt.ivector('ys')
    ys.tag.test_value = numpy.random.randint(y_layer_sizes[-1], size=9).astype(numpy.int32)

    fs = g_f(xs, theta_f)
    e = l_y(g_y(fs, theta_y), ys) + l_d(g_d(r(fs), theta_d), 0) + l_d(g_d(r(g_f(xt, theta_f)), theta_d), 1)

    updates = [(p, p - hp_mu * theano.grad(e, p)) for theta in theta_f + theta_y + theta_d for p in theta]
    train = theano.function([xs, xt, ys], outputs=e, updates=updates)

    return train


def main():
    theano.config.compute_test_value = 'raise'
    numpy.random.seed(1)
    compile(input_size=2, f_layer_sizes=[3, 4], y_layer_sizes=[7, 8], d_layer_sizes=[5, 6], hp_lambda=.5, hp_mu=.01)


main()

这未经测试,但以下内容可能允许将此自定义操作用作烤宽面条层:

This is untested but the following may allow this custom op to be used as a Lasagne layer:

class ReverseGradientLayer(lasagne.layers.Layer):
    def __init__(self, incoming, hp_lambda, **kwargs):
        super(ReverseGradientLayer, self).__init__(incoming, **kwargs)
        self.op = ReverseGradient(hp_lambda)

    def get_output_for(self, input, **kwargs):
        return self.op(input)

这篇关于我可以在反向传播过程中(选择性地)反转Theano梯度吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆