keras反向传播中的跳过层 [英] skipping layer in backpropagation in keras

查看:143
本文介绍了keras反向传播中的跳过层的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将Keras与tensorflow后端一起使用,我很好奇是否可以在反向传播期间跳过一层,但可以在正向传递中执行它.所以这就是我的意思

I am using Keras with tensorflow backend and I am curious whether it is possible to skip a layer during backpropagation but have it execute in the forward pass. So here is what I mean

Lambda (lambda x: a(x))

我想在正向传递中将a应用于x,但是我不想在反向传播发生时将a包括在推导中.

I want to apply a to x in the forward pass but I do not want a to be included in the derivation when the backprop takes place.

我试图找到解决方案,但我找不到任何东西.有人可以帮我吗?

I was trying to find a solution bit I could not find anything. Can somebody help me out here?

推荐答案

更新2

除了 tf.py_func 之外,现在还有一个有关如何添加自定义操作的官方指南.

更新

有关以下示例,请参见此问题.纯粹在Python中编写带有渐变的自定义操作,而无需重建任何内容.请注意,该方法有一些限制(请参见 tf.py_func ).

See this question for an example of writing a custom op with gradient purely in Python without needing to rebuild anything. Note that there are some limitations to the method (see the documentation of tf.py_func).

不是完全解决问题的方法,但仍然是一种答案,而且评论时间太长.

这甚至不是Keras的问题,而是TensorFlow的问题.每个运算符都定义了自己的梯度计算,该计算将在反向传播期间使用.如果您真的想要类似的事情,您将需要自己在TensorFlow中实现操作(绝非易事),并定义所需的渐变-因为您不能没有渐变",如果是1或0(否则将无法进行反向传播). TensorFlow中有一个 tf.NoGradient 函数,该函数会导致op传播零,但我不认为它可以在TensorFlow自己的内部组件中使用.

That's not even a Keras issue, but a TensorFlow one. Each op defines its own gradient computation that is used during backpropagation. I you really wanted to something like that, you would need to implement the op into TensorFlow yourself (no easy feat) and define the gradient that you want - because you can't have "no gradient", if anything it would be 1 or 0 (otherwise you can't go on with backpropagation). There is a tf.NoGradient function in TensorFlow which causes an op to propagate zeros, but I don't think it is meant to / can be used out of TensorFlow own internals.

更新

好吧,还有更多背景信息. TensorFlow图由 ops 构建,由 kernels 实现;这基本上是一对一的映射,除了可能有例如op的CPU和GPU内核,因此有所区别. TensorFlow支持的操作集通常是静态的,我的意思是它可以随着新版本而改变,但是原则上您不能添加自己的操作集,因为图形的操作集采用Protobuf序列化格式,因此,如果您制作了自己的操作符,那么您将无法共享您的图表.然后使用宏REGISTER_OP在C ++级别定义操作(例如,参见

Okay so a bit more of context. TensorFlow graphs are built of ops, which are implemented by kernels; this is basically a 1-to-1 mapping, except that there may be for example a CPU and a GPU kernel for an op, hence the differentiation. The set of ops supported by TensorFlow is usually static, I mean it can change with newer versions, but in principle you cannot add your own ops, because the ops of a graph go into the Protobuf serialized format, so if you made your own ops then you would not be able to share your graph. Ops are then defined at C++ level with the macro REGISTER_OP (see for example here), and kernels with REGISTER_KERNEL_BUILDER (see for example here).

现在,渐变在哪里起作用?好吧,有趣的是,运算符的梯度不是在C ++级别定义的;它是在C ++级别定义的.有 个操作(和内核)实现了其他操作的渐变(如果您查看以前的文件,将会发现名称以Grad结尾的操作/内核),但是(到目前为止据我所知)这些在此级别没有明确地链接".似乎操作数及其渐变之间的关联是在Python中定义的,通常是通过 tf.RegisterGradient 或前面提到的 tf.NoGradient (例如,参见此处,以gen_开头的Python模块是在C ++宏的帮助下自动生成);这些配准将向后传播算法告知如何计算图的梯度.

Now, where do gradients come into play? Well, the funny thing is that the gradient of an op is not defined at C++ level; there are ops (and kernels) that implement the gradient of other ops (if you look at the previous files you'll find ops/kernels with the name ending in Grad), but (as far as I'm aware) these are not explicitly "linked" at this level. It seems that the associations between ops and their gradients is defined in Python, usually via tf.RegisterGradient or the aforementioned tf.NoGradient (see for example here, Python modules starting with gen_ are autogenerated with the help of the C++ macros); these registrations inform the backpropagation algorithm about how to compute the gradient of the graph.

那么,如何实际解决呢?好吧,您需要在C ++中创建至少一个op,并使用相应的内核来实现您想要进行前向传递的计算.然后,如果您要使用的梯度计算可以通过现有的TensorFlow ops(很有可能)来表示,则只需调用可能,甚至还有一个 使用 TensorFlow折叠中找到,它是TensorFlow的扩展,用于注册结构化数据(从一个开始)一个自定义操作此处通过在此处定义的宏,该宏调用REGISTER_OP,并且然后在Python中加载该库并在此处注册它的梯度通过自己的注册功能在此处中定义只需调用

So, how to actually work this out? Well, you need to create at least one op in C++ with the corresponding kernel/s implementing the computation that you want for your forward pass. Then, if the gradient computation that you want to use can be expressed with existing TensorFlow ops (which is most likely), you would just need to call tf.RegisterGradient in Python and do the computation there in "standard" TensorFlow. This is quite complicated, but the good news is it's possible, and there's even an example for it (although I think they kinda forgot the gradient registration part in that one)! As you will see, the process involves compiling the new op code into a library (btw I'm not sure if any of this may work on Windows) that is then loaded from Python (obviously this involves going through the painful process of manual compilation of TensorFlow with Bazel). A possibly more realistic example can be found in TensorFlow Fold, an extension of TensorFlow for structured data that register (as of one) one custom operation here through a macro defined here that calls REGISTER_OP, and then in Python it loads the library and register its gradient here through their own registration function defined here that simply calls tf.NotDifferentiable (another name for tf.NoGradient)

tldr:这很困难,但是可以完成 ,甚至还有几个示例.

tldr: It is rather hard, but it can be done and there are even a couple of examples out there.

这篇关于keras反向传播中的跳过层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆