哪种 tensorflow 方法确实决定了特定批次的示例供模型学习? [英] Which tensorflow method does decide to a particular batch of examples is for the model to learn?
问题描述
我正在尝试了解 SGD 在 tensorflow 中的实现.
I'm trying to understand the implementation of SGD in tensorflow.
我从 gradient_descent 开始.py 因为文件名.
根据 keras doc,优化器需要实现_resource_apply_dense
方法,对应于 代码(部分)如下所示:
Per keras doc, an optimizer needs to implement _resource_apply_dense
method, which corresponds with the code (partly) shown below:
def _resource_apply_dense(self, grad, var, apply_state=None):
var_device, var_dtype = var.device, var.dtype.base_dtype
coefficients = ((apply_state or {}).get((var_device, var_dtype))
or self._fallback_apply_state(var_device, var_dtype))
if self._momentum:
momentum_var = self.get_slot(var, "momentum")
return gen_training_ops.ResourceApplyKerasMomentum(
...
我想知道谁将 var
变量传递给 _resource_apply_dense
方法?换句话说,哪种方法决定这批特定的样本供模型学习?
I'd like to know who passes the var
variable to the _resource_apply_dense
method? In other words, which method decides this particular batch of examples is for the model to learn?
推荐答案
检查 optimizer_v2 或 tensorflow keras,我们发现在整个 tensorflow 代码库中只使用了这个函数:
Checking the optimizer_v2 or tensorflow keras, we find the only use of this function in the entire tensorflow codebase:
#...
def apply_grad_to_update_var(var, grad):
#...
if "apply_state" in self._dense_apply_args:
apply_kwargs["apply_state"] = apply_state
update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
if var.constraint is not None:
with ops.control_dependencies([update_op]):
return var.assign(var.constraint(var))
我们稍后在同一个文件中看到 var
变量来自 _distributed_apply
函数的参数:
We later see on that same file that the var
variable comes from an argument to the _distributed_apply
function:
#...
def _distributed_apply(self, distribution, grads_and_vars, name, apply_state):
#...
with name_scope_only_in_function_or_graph(name or self._name):
for grad, var in grads_and_vars:
#...
最后,grads_and_vars
参数定义为 (梯度,变量)对列表
apply_gradients
:
Finally, the grads_and_vars
argument is defined as List of (gradient, variable) pairs
in the function apply_gradients
:
#...
def apply_gradients(self,
grads_and_vars,
#...
"""...
Args:
grads_and_vars: List of (gradient, variable) pairs.
"""
如果您检查 apply_gradients
(此搜索),你会看到这是更新网络权重的常用方法,因此由更新"控制.优化器的步骤.
If you check the occurrences of apply_gradients
(this search), you will see that it is a common way to update the weights of the network, and is thus controlled by the "update" step of the optimizer.
这篇关于哪种 tensorflow 方法确实决定了特定批次的示例供模型学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!