Keras,级联多个RNN模型以实现N维输出 [英] Keras, cascade multiple RNN models for N-dimensional output

查看:121
本文介绍了Keras,级联多个RNN模型以实现N维输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在以不寻常的方式将两个模型链接在一起时遇到了一些困难.

I'm having some difficulty with chaining together two models in an unusual way.

我正在尝试复制以下流程图:

I am trying to replicate the following flowchart:

为清楚起见,我尝试在Model[0]的每个时间步上使用Model[1]IR[i](中间表示)生成整个时间序列作为重复输入.该方案的目的是允许从一维输入生成一个参差不齐的二维时间序列(同时当不需要该时间步长的输出且不需要Model[0]时,两者都允许省略第二个模型在接受输入和生成输出之间不断地切换模式".

For clarity, at each timestep of Model[0] I am attempting to generate an entire time series from IR[i] (Intermediate Representation) as a repeated input using Model[1]. The purpose of this scheme is it allows the generation of a ragged 2-D time series from a 1-D input (while both allowing the second model to be omitted when the output for that timestep is not needed, and not requiring Model[0] to constantly "switch modes" between accepting input, and generating output).

我假设将需要一个自定义训练循环,并且我已经有一个用于在第一个模型中处理状态的自定义训练循环(以前的版本在每个时间步仅具有一个输出).如图所示,第二个模型应该具有相当短的输出(可以限制为少于10个时间步长).

I assume a custom training loop will be required, and I already have a custom training loop for handling statefulness in the first model (the previous version only had a single output at each timestep). As depicted, the second model should have reasonably short outputs (able to be constrained to fewer than 10 timesteps).

但是,最终,尽管我可以全神贯注于自己想做的事情,但对于Keras和/或Tensorflow而言,我几乎没有足够的技巧来实际实施它. (实际上,这是我第一个使用该库的非玩具项目.)

But at the end of the day, while I can wrap my head around what I want to do, I'm not nearly adroit enough with Keras and/or Tensorflow to actually implement it. (In fact, this is my first non-toy project with the library.)

我没有成功搜索文献来寻找类似的模仿方案或示例代码.而且我什至不知道在TF/Keras内部是否可以实现这个想法.

I have unsuccessfully searched literature for similar schemes to parrot, or example code to fiddle with. And I don't even know if this idea is possible from within TF/Keras.

我已经使两个模型隔离工作. (因为我已经确定了维数,并且对虚拟数据进行了一些训练,以获取第二个模型的垃圾输出,并且第一个模型基于该问题的先前迭代,并且已经过全面训练.)如果我将Model[0]Model[1]作为python变量(我们将它们称为model_amodel_b),那么如何将它们链接在一起以实现此目的?

I already have the two models working in isolation. (As in I've worked out the dimensionality, and done some training with dummy data to get garbage outputs for the second model, and the first model is based off of a previous iteration of this problem and has been fully trained.) If I have Model[0] and Model[1] as python variables (let's call them model_a and model_b), then how would I chain them together to do this?

编辑以添加:

如果还不清楚,那么确定每个输入和输出的尺寸会有所帮助:

If this is all unclear, perhaps having the dimensions of each input and output will help:

每个输入和输出的尺寸为:

The dimensions of each input and output are:

输入:(batch_size, model_a_timesteps, input_size)
IR:(batch_size, model_a_timesteps, ir_size)

Input: (batch_size, model_a_timesteps, input_size)
IR: (batch_size, model_a_timesteps, ir_size)

IR [i](复制后):(batch_size, model_b_timesteps, ir_size)
Out [i]:(batch_size, model_b_timesteps, output_size)
出:(batch_size, model_a_timesteps, model_b_timesteps, output_size)

IR[i] (after duplication): (batch_size, model_b_timesteps, ir_size)
Out[i]: (batch_size, model_b_timesteps, output_size)
Out: (batch_size, model_a_timesteps, model_b_timesteps, output_size)

推荐答案

由于此问题包含多个主要部分,因此我对核心挑战进行了问答:

As this question has multiple major parts, I've dedicated a Q&A to the core challenge: stateful backpropagation. This answer focuses on implementing the variable output step length.

说明:

  • 如案例5所示,我们可以采用自下而上的第一种方法.首先,我们将完整的输入提供给model_a(A)-然后,将其输出作为输入提供给model_b(B),但这一次一次.
  • 请注意,我们必须链接B的输出步骤 per A的输入步骤,而不是在 A的输入步骤之间;也就是说,在您的图中,梯度是在Out[0][1]Out[0][0]之间流动,而不是在Out[2][0]Out[0][1]之间流动.
  • 对于计算损失,我们使用参差不齐的张量还是填充的张量都无关紧要;但是,我们必须使用填充的张量来写入TensorArray.
  • 以下代码中的循环逻辑是通用的;但是,为简化起见,对特定的属性处理和隐藏状态传递进行了硬编码,但为了通用起见,可以对其进行重写.
  • As validated in Case 5, we can take a bottom-up first approach. First we feed the complete input to model_a (A) - then, feed its outputs as input to model_b (B), but this time one step at a time.
  • Note that we must chain B's output steps per A's input step, not between A's input steps; i.e., in your diagram, gradient is to flow between Out[0][1] and Out[0][0], but not between Out[2][0] and Out[0][1].
  • For computing loss it won't matter whether we use a ragged or padded tensor; we must however use a padded tensor for writing to TensorArray.
  • Loop logic in code below is general; specific attribute handling and hidden state passing, however, is hard-coded for simplicity, but can be rewritten for generality.

代码:底部.

示例:

  • 在这里,我们为来自A的每个输入预先定义了B的迭代次数,但是我们可以实现任何任意的停止逻辑.例如,我们可以将B的Dense层的输出作为隐藏状态,并检查其L2范数是否超过阈值.
  • 上面,如果我们不知道longest_step,我们可以简单地设置它,这对于NLP&其他带有STOP令牌的任务.
      或者,我们可以用dynamic_size=True在每个A的输入处写成单独的TensorArrays;参见不确定点".在下面.
    • Here we predefine the number of iterations for B per input from A, but we can implement any arbitrary stopping logic. For example, we can take a Dense layer's output from B as a hidden state and check if its L2-norm exceeds a threshold.
    • Per above, if longest_step is unknown to us, we can simply set it, which is common for NLP & other tasks with a STOP token.
      • Alternatively, we may write to separate TensorArrays at every A's input with dynamic_size=True; see "point of uncertainty" below.

      不确定点:例如,我不确定梯度之间是否存在相互作用. Out[0][1]Out[2][0].但是,我确实验证了,如果我们为每个A的输入将B的输出写成单独的TensorArray,则梯度不会水平流动(情况2);重新执行案例4和如图5所示,两种类型的研究生的毕业证书都会有所不同,包括具有完整单次水平通行证的低级研究生.

      Point of uncertainty: I'm not entirely sure whether gradients interact between e.g. Out[0][1] and Out[2][0]. I did, however, verify that gradients will not flow horizontally if we write to separate TensorArrays for B's outputs per A's inputs (case 2); reimplementing for cases 4 & 5, grads will differ for both models, including lower one with a complete single horizontal pass.

      因此,我们必须写一个统一的TensorArray.为此,因为没有任何操作来自例如IR[1]Out[0][1],我看不到TF如何跟踪它-看来我们很安全.但是请注意,在下面的示例中,使用steps_at_t=[1]*6 将使两个模型中的梯度水平流动,因为我们正在编写单个TensorArray并传递隐藏状态.

      Thus we must write to a unified TensorArray. For such, as there are no ops leading from e.g. IR[1] to Out[0][1], I can't see how TF would trace it as such - so it seems we're safe. Note, however, that in below example, using steps_at_t=[1]*6 will make gradient flow in the both model horizontally, as we're writing to a single TensorArray and passing hidden states.

      但是,被检查的案例是令人困惑的,其中B在所有步骤中都是有状态的;取消此要求,我们可能不需要为所有Out[0]Out[1]等编写统一的TensorArray,但是我们仍然必须针对已知的工作进行测试,而这已经不再适用如此简单

      The examined case is confounded, however, with B being stateful at all steps; lifting this requirement, we might not need to write to a unified TensorArray for all Out[0], Out[1], etc, but we must still test against something we know works, which is no longer as straightforward.

      示例[代码] :

      import numpy as np
      import tensorflow as tf
      
      #%%# Make data & models, then fit ###########################################
      x0 = y0 = tf.constant(np.random.randn(2, 3, 4))
      msn = MultiStatefulNetwork(batch_shape=(2, 3, 4), steps_at_t=[3, 4, 2])
      
      #%%#############################################
      with tf.GradientTape(persistent=True) as tape:
          outputs = msn(x0)
          # shape: (3, 4, 2, 4), 0-padded
          # We can pad labels accordingly.
          # Note the (2, 4) model_b's output shape, which is a timestep slice;
          # model_b is a *slice model*. Careful in implementing various logics
          # which are and aren't intended to be stateful.
      


      方法:

      不是最干净也不是最理想的代码,但是它可以工作;有改善的空间.

      Not the cleanest, nor most optimal code, but it works; room for improvement.

      更重要的是:我在Eager中实现了这一点,并且不知道它如何在Graph中工作,而使其同时适用于两者可能非常棘手.如果需要,只需在Graph中运行,然后按照案例"中的操作比较所有值即可.

      More importantly: I implemented this in Eager, and have no idea how it'll work in Graph, and making it work for both can be quite tricky. If needed, just run in Graph and compare all values as done in the "cases".

      # ideally we won't `import tensorflow` at all; kept for code simplicity
      import tensorflow as tf
      from tensorflow.python.util import nest
      from tensorflow.python.ops import array_ops, tensor_array_ops
      from tensorflow.python.framework import ops
      
      from tensorflow.keras.layers import Input, SimpleRNN, SimpleRNNCell
      from tensorflow.keras.models import Model
      
      #######################################################################
      class MultiStatefulNetwork():
          def __init__(self, batch_shape=(2, 6, 4), steps_at_t=[]):
              self.batch_shape=batch_shape
              self.steps_at_t=steps_at_t
      
              self.batch_size = batch_shape[0]
              self.units = batch_shape[-1]
              self._build_models()
      
          def __call__(self, inputs):
              outputs = self._forward_pass_a(inputs)
              outputs = self._forward_pass_b(outputs)
              return outputs
      
          def _forward_pass_a(self, inputs):
              return self.model_a(inputs, training=True)
      
          def _forward_pass_b(self, inputs):
              return model_rnn_outer(self.model_b, inputs, self.steps_at_t)
      
          def _build_models(self):
              ipt = Input(batch_shape=self.batch_shape)
              out = SimpleRNN(self.units, return_sequences=True)(ipt)
              self.model_a = Model(ipt, out)
      
              ipt  = Input(batch_shape=(self.batch_size, self.units))
              sipt = Input(batch_shape=(self.batch_size, self.units))
              out, state = SimpleRNNCell(4)(ipt, sipt)
              self.model_b = Model([ipt, sipt], [out, state])
      
              self.model_a.compile('sgd', 'mse')
              self.model_b.compile('sgd', 'mse')
      
      
      def inner_pass(model, inputs, states):
          return model_rnn(model, inputs, states)
      
      
      def model_rnn_outer(model, inputs, steps_at_t=[2, 2, 4, 3]):
          def outer_step_function(inputs, states):
              x, steps = inputs
              x = array_ops.expand_dims(x, 0)
              x = array_ops.tile(x, [steps, *[1] * (x.ndim - 1)])  # repeat steps times
              output, new_states = inner_pass(model, x, states)
              return output, new_states
      
          (outer_steps, steps_at_t, longest_step, outer_t, initial_states,
           output_ta, input_ta) = _process_args_outer(model, inputs, steps_at_t)
      
          def _outer_step(outer_t, output_ta_t, *states):
              current_input = [input_ta.read(outer_t), steps_at_t.read(outer_t)]
              output, new_states = outer_step_function(current_input, tuple(states))
      
              # pad if shorter than longest_step.
              # model_b may output twice, but longest in `steps_at_t` is 4; then we need
              # output.shape == (2, *model_b.output_shape) -> (4, *...)
              # checking directly on `output` is more reliable than from `steps_at_t`
              output = tf.cond(
                  tf.math.less(output.shape[0], longest_step),
                  lambda: tf.pad(output, [[0, longest_step - output.shape[0]],
                                          *[[0, 0]] * (output.ndim - 1)]),
                  lambda: output)
      
              output_ta_t = output_ta_t.write(outer_t, output)
              return (outer_t + 1, output_ta_t) + tuple(new_states)
      
          final_outputs = tf.while_loop(
              body=_outer_step,
              loop_vars=(outer_t, output_ta) + initial_states,
              cond=lambda outer_t, *_: tf.math.less(outer_t, outer_steps))
      
          output_ta = final_outputs[1]
          outputs = output_ta.stack()
          return outputs
      
      
      def _process_args_outer(model, inputs, steps_at_t):
          def swap_batch_timestep(input_t):
              # Swap the batch and timestep dim for the incoming tensor.
              # (samples, timesteps, channels) -> (timesteps, samples, channels)
              # iterating dim0 to feed (samples, channels) slices expected by RNN
              axes = list(range(len(input_t.shape)))
              axes[0], axes[1] = 1, 0
              return array_ops.transpose(input_t, axes)
      
          inputs = nest.map_structure(swap_batch_timestep, inputs)
      
          assert inputs.shape[0] == len(steps_at_t)
          outer_steps = array_ops.shape(inputs)[0]  # model_a_steps
          longest_step = max(steps_at_t)
          steps_at_t = tensor_array_ops.TensorArray(
              dtype=tf.int32, size=len(steps_at_t)).unstack(steps_at_t)
      
          # assume single-input network, excluding states which are handled separately
          input_ta = tensor_array_ops.TensorArray(
              dtype=inputs.dtype,
              size=outer_steps,
              element_shape=tf.TensorShape(model.input_shape[0]),
              tensor_array_name='outer_input_ta_0').unstack(inputs)
      
          # TensorArray is used to write outputs at every timestep, but does not
          # support RaggedTensor; thus we must make TensorArray such that column length
          # is that of the longest outer step, # and pad model_b's outputs accordingly
          element_shape = tf.TensorShape((longest_step, *model.output_shape[0]))
      
          # overall shape: (outer_steps, longest_step, *model_b.output_shape)
          # for every input / at each step we write in dim0 (outer_steps)
          output_ta = tensor_array_ops.TensorArray(
              dtype=model.output[0].dtype,
              size=outer_steps,
              element_shape=element_shape,
              tensor_array_name='outer_output_ta_0')
      
          outer_t = tf.constant(0, dtype='int32')
          initial_states = (tf.zeros(model.input_shape[0], dtype='float32'),)
      
          return (outer_steps, steps_at_t, longest_step, outer_t, initial_states,
                  output_ta, input_ta)
      
      
      def model_rnn(model, inputs, states):
          def step_function(inputs, states):
              output, new_states = model([inputs, *states], training=True)
              return output, new_states
      
          initial_states = states
          input_ta, output_ta, time, time_steps_t = _process_args(model, inputs)
      
          def _step(time, output_ta_t, *states):
              current_input = input_ta.read(time)
              output, new_states = step_function(current_input, tuple(states))
      
              flat_state = nest.flatten(states)
              flat_new_state = nest.flatten(new_states)
              for state, new_state in zip(flat_state, flat_new_state):
                  if isinstance(new_state, ops.Tensor):
                      new_state.set_shape(state.shape)
      
              output_ta_t = output_ta_t.write(time, output)
              new_states = nest.pack_sequence_as(initial_states, flat_new_state)
              return (time + 1, output_ta_t) + tuple(new_states)
      
          final_outputs = tf.while_loop(
              body=_step,
              loop_vars=(time, output_ta) + tuple(initial_states),
              cond=lambda time, *_: tf.math.less(time, time_steps_t))
      
          new_states = final_outputs[2:]
          output_ta = final_outputs[1]
          outputs = output_ta.stack()
          return outputs, new_states
      
      
      def _process_args(model, inputs):
          time_steps_t = tf.constant(inputs.shape[0], dtype='int32')
      
          # assume single-input network (excluding states)
          input_ta = tensor_array_ops.TensorArray(
              dtype=inputs.dtype,
              size=time_steps_t,
              tensor_array_name='input_ta_0').unstack(inputs)
      
          # assume single-output network (excluding states)
          output_ta = tensor_array_ops.TensorArray(
              dtype=model.output[0].dtype,
              size=time_steps_t,
              element_shape=tf.TensorShape(model.output_shape[0]),
              tensor_array_name='output_ta_0')
      
          time = tf.constant(0, dtype='int32', name='time')
          return input_ta, output_ta, time, time_steps_t
      

      这篇关于Keras,级联多个RNN模型以实现N维输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆