哪种 tensorflow 方法确实决定了特定批次的示例供模型学习? [英] Which tensorflow method does decide to a particular batch of examples is for the model to learn?

查看:32
本文介绍了哪种 tensorflow 方法确实决定了特定批次的示例供模型学习?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解 SGD 在 tensorflow 中的实现.

I'm trying to understand the implementation of SGD in tensorflow.

我从 gradient_descent 开始.py 因为文件名.

根据 keras doc,优化器需要实现_resource_apply_dense 方法,对应于 代码(部分)如下所示:

Per keras doc, an optimizer needs to implement _resource_apply_dense method, which corresponds with the code (partly) shown below:

def _resource_apply_dense(self, grad, var, apply_state=None):
    var_device, var_dtype = var.device, var.dtype.base_dtype
    coefficients = ((apply_state or {}).get((var_device, var_dtype))
                    or self._fallback_apply_state(var_device, var_dtype))

    if self._momentum:
    momentum_var = self.get_slot(var, "momentum")
    return gen_training_ops.ResourceApplyKerasMomentum(
        ...

我想知道谁将 var 变量传递给 _resource_apply_dense 方法?换句话说,哪种方法决定这批特定的样本供模型学习?

I'd like to know who passes the var variable to the _resource_apply_dense method? In other words, which method decides this particular batch of examples is for the model to learn?

推荐答案

检查 optimizer_v2 或 tensorflow keras,我们发现在整个 tensorflow 代码库中只使用了这个函数:

Checking the optimizer_v2 or tensorflow keras, we find the only use of this function in the entire tensorflow codebase:

   #...
   def apply_grad_to_update_var(var, grad):
      #...
      if "apply_state" in self._dense_apply_args:
        apply_kwargs["apply_state"] = apply_state
      update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
      if var.constraint is not None:
        with ops.control_dependencies([update_op]):
          return var.assign(var.constraint(var))

我们稍后在同一个文件中看到 var 变量来自 _distributed_apply 函数的参数:

We later see on that same file that the var variable comes from an argument to the _distributed_apply function:

#...
def _distributed_apply(self, distribution, grads_and_vars, name, apply_state):
    #...
    with name_scope_only_in_function_or_graph(name or self._name):
      for grad, var in grads_and_vars:
      #...

最后,grads_and_vars 参数定义为 (梯度,变量)对列表 apply_gradients:

Finally, the grads_and_vars argument is defined as List of (gradient, variable) pairs in the function apply_gradients:

  #...
  def apply_gradients(self,
                      grads_and_vars,
    #...
    """...
    Args:
      grads_and_vars: List of (gradient, variable) pairs.
    """

如果您检查 apply_gradients (此搜索),你会看到这是更新网络权重的常用方法,因此由更新"控制.优化器的步骤.

If you check the occurrences of apply_gradients (this search), you will see that it is a common way to update the weights of the network, and is thus controlled by the "update" step of the optimizer.

这篇关于哪种 tensorflow 方法确实决定了特定批次的示例供模型学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆